Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Login
Top
New
Best
Ask
Show
Jobs
A minimal hackable implementation of policy gradients (GRPO, PPO, REINFORCE)
github.com/zafstojano
1 point
starzmustdie
5 months ago
No comment yet
A minimal hackable implementation of policy gradients (GRPO, PPO, REINFORCE) | Heykuki News