Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Login
Top
New
Best
Ask
Show
Jobs
211.
▲
Show HN: Distill – Remove redundant RAG context in 12ms, no LLM calls
2 points
sidk24
5 months ago
discuss
212.
▲
Threads can infect each other with their low priority
(github.com/Dobiasd)
68 points
Dobiasd
6 years ago
35 comments
213.
▲
Llama2.c: Inference llama 2 in one file of pure C
(github.com/karpathy)
707 points
anjneymidha
3 years ago
165 comments
214.
▲
The path to open-sourcing the DeepSeek inference engine
(github.com/deepseek-ai)
550 points
Palmik
a year ago
63 comments
215.
▲
DeepSeek open source DeepEP – library for MoE training and Inference
(github.com/deepseek-ai)
536 points
helloericsf
a year ago
71 comments
216.
▲
DeepSeek 4 Flash local inference engine for Metal
(github.com/antirez)
499 points
tamnd
a month ago
159 comments
217.
▲
Flux 2 Klein pure C inference
(github.com/antirez)
453 points
antirez
5 months ago
141 comments
218.
▲
Gemma.cpp: lightweight, standalone C++ inference engine for Gemma models
(github.com/google)
422 points
mfiguiere
2 years ago
130 comments
219.
▲
BitNet: Inference framework for 1-bit LLMs
(github.com/microsoft)
370 points
redm
3 months ago
167 comments
220.
▲
Exllamav2: Inference library for running LLMs locally on consumer-class GPUs
(github.com/turboderp)
322 points
Palmik
3 years ago
125 comments
221.
▲
Pure C, CPU-only inference with Mistral Voxtral Realtime 4B speech to text model
(github.com/antirez)
311 points
Curiositry
4 months ago
35 comments
222.
▲
Lm.rs: Minimal CPU LLM inference in Rust with no dependency
(github.com/samuel-vitorino)
310 points
littlestymaar
2 years ago
76 comments
223.
▲
Web LLM – WebGPU Powered Inference of Large Language Models
(github.com/mlc-ai)
276 points
summarity
3 years ago
80 comments
224.
▲
Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon
(github.com/RunanywhereAI)
240 points
sanchitmonga22
3 months ago
153 comments
225.
▲
A general-purpose probabilistic programming system with programmable inference
(github.com/probcomp)
238 points
espeed
7 years ago
72 comments
226.
▲
Hypura – A storage-tier-aware LLM inference scheduler for Apple Silicon
(github.com/t8)
221 points
tatef
2 months ago
85 comments
227.
▲
Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA
(github.com/jmaczan)
204 points
yu3zhou4
8 days ago
18 comments
228.
▲
Gluon – A static, type-inferred and embeddable language written in Rust
(github.com/gluon-lang)
203 points
Lapz
8 years ago
94 comments
229.
▲
Llama.rs – Rust port of llama.cpp for fast LLaMA inference on CPU
(github.com/setzer22)
202 points
rrampage
3 years ago
24 comments
230.
▲
Show HN: We made our own inference engine for Apple Silicon
(github.com/trymirai)
186 points
darkolorin
a year ago
46 comments
231.
▲
Microsoft BitNet: inference framework for 1-bit LLMs
(github.com/microsoft)
173 points
galeos
2 years ago
33 comments
232.
▲
Nvidia Dynamo: A Datacenter Scale Distributed Inference Serving Framework
(github.com/ai-dynamo)
150 points
ashvardanian
a year ago
39 comments
233.
▲
LLMLingua: Compressing Prompts for Faster Inferencing
(github.com/microsoft)
149 points
TarqDirtyToMe
2 years ago
47 comments
234.
▲
Show HN: Zero-codegen, no-compile TypeScript type inference from Protobufs
(github.com/nathanhleung)
138 points
18nleung
a year ago
73 comments
235.
▲
Gluon: A static, type inferred and embeddable language written in Rust
(github.com/Marwes)
136 points
jswny
10 years ago
48 comments
236.
▲
Launch HN: Cactus (YC S25) – AI inference on smartphones
(github.com/cactus-compute)
123 points
HenryNdubuaku
9 months ago
63 comments
237.
▲
Node9: Inferno kernel with LuaJIT instead of the Dis virtual machine
(github.com/jvburnes)
116 points
f2f
11 years ago
21 comments
238.
▲
Parakeet.cpp – Parakeet ASR inference in pure C++ with Metal GPU acceleration
(github.com/Frikallo)
114 points
noahkay13
3 months ago
31 comments
239.
▲
C++ GPT-2 inference engine
(github.com/a1k0n)
114 points
version_five
3 years ago
7 comments
240.
▲
Ultra-minimal JSON schemas with TypeScript inference
(github.com/ar-nelson)
103 points
codewithcheese
4 years ago
44 comments
More