Real-time LLM Inference on Standard GPUs (3k tokens/s per request) | Heykuki News