I applied the same architecture that made Polars fast — Rust data plane, Python control plane, connected via PyO3. The researcher writes Python. Environment stepping, replay buffers, and advantage computation run in Rust with Rayon parallelism. PyTorch handles the neural networks.
Quick start:
pip install rlox
from rlox.trainers import PPOTrainer
trainer = PPOTrainer(env="CartPole-v1")
trainer.train(50_000)