Heykuki News

TopNewBestAskShowJobs
TopNewBestAskShowJobs
1.
Real-time LLM Inference on Standard GPUs: 3k tokens/s per request (blog.kog.ai)
219 points
NicoConstant
9 days ago
97 comments
2.
Real-time LLM Inference on Standard GPUs (3k tokens/s per request) (blog.kog.ai)
7 points
morgangiraud
10 days ago
discuss
3.
3000 tokens/sec LLM playground (playground.kog.ai)
6 points
rashkov
9 days ago
3 comments
4.
Delayed Tensor Parallelism for Faster Transformer Inference (blog.kog.ai)
2 points
matt_d
9 days ago
discuss