Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Login
Top
New
Best
Ask
Show
Jobs
A new CUDA kernel for quantized LLMs achieves up to 2.6x latency improvements
github.com/HanGuo97
2 points
radichoml
2 years ago
1 comment
Loading...
A new CUDA kernel for quantized LLMs achieves up to 2.6x latency improvements | Heykuki News