Show HN: LeanRL: Fast PyTorch RL with Torch.compile and CUDA Graphs

53 points

2 years ago

We're excited to announce that we've open-sourced LeanRL, a lightweight PyTorch reinforcement learning library that provides recipes for fast RL training using torch.compile and CUDA graphs. By leveraging these tools, we've achieved significant speed-ups compared to the original CleanRL implementations - up to 6x faster!

Reinforcement learning is notoriously CPU-bound due to the high frequency of small CPU operations. PyTorch's powerful compiler can help alleviate these issues, but comes with its own costs. LeanRL addresses this challenge by providing simple recipes to accelerate your training loop and better utilize your GPU.

Key results: - 6.8x speed-up with PPO (Atari) - 5.7x speed-up with SAC - 3.4x speed-up with TD3 - 2.7x speed-up with PPO (continuous actions)

Why LeanRL?

- Single-file implementations of RL algorithms with minimal dependencies in the spirit of gpt-fast - All optimization tricks are explained in the README - no heavy doc, just simple tricks - Forked from the popular CleanRL library

Check out LeanRL on https://github.com/pytorch-labs/leanrl now!

5 comments