Search: github.com/tnfe | Heykuki News

Heykuki News

Top New Best Ask Show Jobs

Top New Best Ask Show Jobs

211.

Show HN: Distill – Remove redundant RAG context in 12ms, no LLM calls

2 points

5 months ago

212.

Threads can infect each other with their low priority (github.com/Dobiasd)

68 points

6 years ago

213.

Llama2.c: Inference llama 2 in one file of pure C (github.com/karpathy)

707 points

3 years ago

214.

The path to open-sourcing the DeepSeek inference engine (github.com/deepseek-ai)

550 points

a year ago

215.

DeepSeek open source DeepEP – library for MoE training and Inference (github.com/deepseek-ai)

536 points

a year ago

216.

DeepSeek 4 Flash local inference engine for Metal (github.com/antirez)

499 points

a month ago

217.

Flux 2 Klein pure C inference (github.com/antirez)

453 points

5 months ago

218.

Gemma.cpp: lightweight, standalone C++ inference engine for Gemma models (github.com/google)

422 points

2 years ago

219.

BitNet: Inference framework for 1-bit LLMs (github.com/microsoft)

370 points

3 months ago

220.

Exllamav2: Inference library for running LLMs locally on consumer-class GPUs (github.com/turboderp)

322 points

3 years ago

221.

Pure C, CPU-only inference with Mistral Voxtral Realtime 4B speech to text model (github.com/antirez)

311 points

4 months ago

222.

Lm.rs: Minimal CPU LLM inference in Rust with no dependency (github.com/samuel-vitorino)

310 points

2 years ago

223.

Web LLM – WebGPU Powered Inference of Large Language Models (github.com/mlc-ai)

276 points

3 years ago

224.

Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon (github.com/RunanywhereAI)

240 points

3 months ago

225.

A general-purpose probabilistic programming system with programmable inference (github.com/probcomp)

238 points

7 years ago

226.

Hypura – A storage-tier-aware LLM inference scheduler for Apple Silicon (github.com/t8)

221 points

2 months ago

227.

Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA (github.com/jmaczan)

204 points

8 days ago

228.

Gluon – A static, type-inferred and embeddable language written in Rust (github.com/gluon-lang)

203 points

8 years ago

229.

Llama.rs – Rust port of llama.cpp for fast LLaMA inference on CPU (github.com/setzer22)

202 points

3 years ago

230.

Show HN: We made our own inference engine for Apple Silicon (github.com/trymirai)

186 points

a year ago

231.

Microsoft BitNet: inference framework for 1-bit LLMs (github.com/microsoft)

173 points

2 years ago

232.

Nvidia Dynamo: A Datacenter Scale Distributed Inference Serving Framework (github.com/ai-dynamo)

150 points

a year ago

233.

LLMLingua: Compressing Prompts for Faster Inferencing (github.com/microsoft)

149 points

2 years ago

234.

Show HN: Zero-codegen, no-compile TypeScript type inference from Protobufs (github.com/nathanhleung)

138 points

a year ago

235.

Gluon: A static, type inferred and embeddable language written in Rust (github.com/Marwes)

136 points

10 years ago

236.

Launch HN: Cactus (YC S25) – AI inference on smartphones (github.com/cactus-compute)

123 points

9 months ago

237.

Node9: Inferno kernel with LuaJIT instead of the Dis virtual machine (github.com/jvburnes)

116 points

11 years ago

238.

Parakeet.cpp – Parakeet ASR inference in pure C++ with Metal GPU acceleration (github.com/Frikallo)

114 points

3 months ago

239.

C++ GPT-2 inference engine (github.com/a1k0n)

114 points

3 years ago

240.

Ultra-minimal JSON schemas with TypeScript inference (github.com/ar-nelson)

103 points

4 years ago