Search: github.com/FMInference | Heykuki News

Heykuki News

Top New Best Ask Show Jobs

Top New Best Ask Show Jobs

1.

FlexGen: Running large language models on a single GPU (github.com/FMInference)

192 points

3 years ago

2.

Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model (github.com/cactus-compute)

774 points

24 days ago

3.

DeepSeek 4 Flash local inference engine for Metal (github.com/antirez)

499 points

a month ago

4.

Atlas TQ1_0 – Pure C++ Ternary (1.58-Bit) Inference Engine for CPU (github.com/xxxn3m3s1sxxx)

3 points

18 days ago

5.

Ternative – C++/CUDA inference engine for ternary LLMs with runtime LoRA (github.com/michelangeloromerochisco)

2 points

17 days ago

6.

Why Gemma-4 26B MoE works in HuggingFace but breaks in prod inference engines (github.com/maeddesg)

1 point

21 days ago

7.

Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks (github.com/antoinezambelli)

197 points

17 days ago

8.

Show HN: I blind-tested 14 LLMs on a WP plugin task. Surprising Findings (github.com/guilamu)

3 points

a month ago