Heykuki News

TopNewBestAskShowJobs
TopNewBestAskShowJobs
211.
Show HN: Distill – Remove redundant RAG context in 12ms, no LLM calls
2 points
sidk24
5 months ago
discuss
212.
Threads can infect each other with their low priority (github.com/Dobiasd)
68 points
Dobiasd
6 years ago
35 comments
213.
Llama2.c: Inference llama 2 in one file of pure C (github.com/karpathy)
707 points
anjneymidha
3 years ago
165 comments
214.
The path to open-sourcing the DeepSeek inference engine (github.com/deepseek-ai)
550 points
Palmik
a year ago
63 comments
215.
DeepSeek open source DeepEP – library for MoE training and Inference (github.com/deepseek-ai)
536 points
helloericsf
a year ago
71 comments
216.
DeepSeek 4 Flash local inference engine for Metal (github.com/antirez)
499 points
tamnd
a month ago
159 comments
217.
Flux 2 Klein pure C inference (github.com/antirez)
453 points
antirez
5 months ago
141 comments
218.
Gemma.cpp: lightweight, standalone C++ inference engine for Gemma models (github.com/google)
422 points
mfiguiere
2 years ago
130 comments
219.
BitNet: Inference framework for 1-bit LLMs (github.com/microsoft)
370 points
redm
3 months ago
167 comments
220.
Exllamav2: Inference library for running LLMs locally on consumer-class GPUs (github.com/turboderp)
322 points
Palmik
3 years ago
125 comments
221.
Pure C, CPU-only inference with Mistral Voxtral Realtime 4B speech to text model (github.com/antirez)
311 points
Curiositry
4 months ago
35 comments
222.
Lm.rs: Minimal CPU LLM inference in Rust with no dependency (github.com/samuel-vitorino)
310 points
littlestymaar
2 years ago
76 comments
223.
Web LLM – WebGPU Powered Inference of Large Language Models (github.com/mlc-ai)
276 points
summarity
3 years ago
80 comments
224.
Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon (github.com/RunanywhereAI)
240 points
sanchitmonga22
3 months ago
153 comments
225.
A general-purpose probabilistic programming system with programmable inference (github.com/probcomp)
238 points
espeed
7 years ago
72 comments
226.
Hypura – A storage-tier-aware LLM inference scheduler for Apple Silicon (github.com/t8)
221 points
tatef
2 months ago
85 comments
227.
Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA (github.com/jmaczan)
204 points
yu3zhou4
8 days ago
18 comments
228.
Gluon – A static, type-inferred and embeddable language written in Rust (github.com/gluon-lang)
203 points
Lapz
8 years ago
94 comments
229.
Llama.rs – Rust port of llama.cpp for fast LLaMA inference on CPU (github.com/setzer22)
202 points
rrampage
3 years ago
24 comments
230.
Show HN: We made our own inference engine for Apple Silicon (github.com/trymirai)
186 points
darkolorin
a year ago
46 comments
231.
Microsoft BitNet: inference framework for 1-bit LLMs (github.com/microsoft)
173 points
galeos
2 years ago
33 comments
232.
Nvidia Dynamo: A Datacenter Scale Distributed Inference Serving Framework (github.com/ai-dynamo)
150 points
ashvardanian
a year ago
39 comments
233.
LLMLingua: Compressing Prompts for Faster Inferencing (github.com/microsoft)
149 points
TarqDirtyToMe
2 years ago
47 comments
234.
Show HN: Zero-codegen, no-compile TypeScript type inference from Protobufs (github.com/nathanhleung)
138 points
18nleung
a year ago
73 comments
235.
Gluon: A static, type inferred and embeddable language written in Rust (github.com/Marwes)
136 points
jswny
10 years ago
48 comments
236.
Launch HN: Cactus (YC S25) – AI inference on smartphones (github.com/cactus-compute)
123 points
HenryNdubuaku
9 months ago
63 comments
237.
Node9: Inferno kernel with LuaJIT instead of the Dis virtual machine (github.com/jvburnes)
116 points
f2f
11 years ago
21 comments
238.
Parakeet.cpp – Parakeet ASR inference in pure C++ with Metal GPU acceleration (github.com/Frikallo)
114 points
noahkay13
3 months ago
31 comments
239.
C++ GPT-2 inference engine (github.com/a1k0n)
114 points
version_five
3 years ago
7 comments
240.
Ultra-minimal JSON schemas with TypeScript inference (github.com/ar-nelson)
103 points
codewithcheese
4 years ago
44 comments
More