Heykuki News

TopNewBestAskShowJobs
TopNewBestAskShowJobs
1.
Real-time LLM Inference on Standard GPUs: 3k tokens/s per request (blog.kog.ai)
218 points
NicoConstant
9 days ago
96 comments
2.
Real-time LLM Inference on Standard GPUs (3k tokens/s per request) (blog.kog.ai)
7 points
morgangiraud
10 days ago
discuss
3.
Delayed Tensor Parallelism for Faster Transformer Inference (blog.kog.ai)
2 points
matt_d
9 days ago
discuss