Simple GRPO – RL for 8B models on $10/h GPUs | Heykuki News