Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Login
Top
New
Best
Ask
Show
Jobs
Implementing DeepSeek R1's GRPO algorithm from scratch
github.com/policy-gradient
192 points
xcodevn
a year ago
3 comments
Loading...
Implementing DeepSeek R1's GRPO algorithm from scratch | Heykuki News