Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Login
Top
New
Best
Ask
Show
Jobs
1.
▲
Real-time LLM Inference on Standard GPUs: 3k tokens/s per request
(blog.kog.ai)
219 points
NicoConstant
9 days ago
97 comments
2.
▲
Real-time LLM Inference on Standard GPUs (3k tokens/s per request)
(blog.kog.ai)
7 points
morgangiraud
10 days ago
discuss
3.
▲
3000 tokens/sec LLM playground
(playground.kog.ai)
6 points
rashkov
9 days ago
3 comments
4.
▲
Delayed Tensor Parallelism for Faster Transformer Inference
(blog.kog.ai)
2 points
matt_d
9 days ago
discuss