Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Login
Top
New
Best
Ask
Show
Jobs
511.
▲
OMLX – Ollama for MLX (LLM Inference Server for Apple Silicon)
(github.com/jundot)
2 points
fintechie
4 months ago
discuss
512.
▲
Show HN: Omni-NLI – A multi-interface server for natural language inference
2 points
habedi0
4 months ago
discuss
513.
▲
Show HN: EmbodIOS – AI Operating System with Kernel-Level Inference
(github.com/dddimcha)
2 points
dddimcha
5 months ago
discuss
514.
▲
Rig: Distributed LLM inference across machines in Rust
(github.com/buyukakyuz)
2 points
corrode2711
5 months ago
discuss
515.
▲
Tract: Self-contained, TensorFlow and ONNX inference
(github.com/sonos)
2 points
vishnukvmd
5 months ago
discuss
516.
▲
EmbodIOS - AI inference as the operating system (3.5s cold start)
(github.com/dddimcha)
2 points
dddimcha
5 months ago
discuss
517.
▲
HF-mem: CLI to estimate inference memory requirements for Hugging Face models
(github.com/alvarobartt)
2 points
handfuloflight
5 months ago
discuss
518.
▲
Mini-SGLang: A lightweight yet high-performance inference framework for LLM
(github.com/sgl-project)
2 points
limoce
6 months ago
discuss
519.
▲
Go apps can directly integrate llama.cpp for HW accelerated local inference
(github.com/hybridgroup)
2 points
deadprogram
6 months ago
discuss
520.
▲
Show HN: Olla – Lightweight LLM Proxy for Homelab and OnPrem AI Inference
2 points
thushanfernando
10 months ago
discuss
521.
▲
WebAssembly binding for llama.cpp – Enabling on-browser LLM inference
(github.com/ngxson)
2 points
selvan
a year ago
discuss
522.
▲
Show HN: Dwani.ai – multimodal inference API for Indian languages
(dwani.ai)
2 points
gaganyatri
a year ago
discuss
523.
▲
GPT4Free: "educational project" for free LLM inference from various services
(github.com/xtekky)
2 points
bobbiechen
a year ago
discuss
524.
▲
OmniPainter: Training-Free Stylized Text-to-Image Generation with Fast Inference
(github.com/maxin-cn)
2 points
dvrp
a year ago
discuss
525.
▲
GPU-enabled Llama 3 inference in Java from scratch
(github.com/beehive-lab)
2 points
mikepapadim
a year ago
discuss
526.
▲
BitNet 1.58bit GPU Inference Kernel
(github.com/microsoft)
2 points
galeos
a year ago
discuss
527.
▲
Kubernetes-native distributed LLM inference framework
(github.com/llm-d)
2 points
baijum
a year ago
discuss
528.
▲
Show HN: Contextual AI Document Parser – Infer hierarchy for long, complex docs
2 points
ishan_sinha
a year ago
discuss
529.
▲
Lambda calculus - compiler, type inference, and evaluator in less than 100 LOC
(gist.github.com)
2 points
tearflake
a year ago
discuss
530.
▲
Protobuf-ts-types: zero-codegen TypeScript type inference from protobuf messages
(github.com/nathanhleung)
2 points
18nleung
a year ago
discuss
531.
▲
Eagle-3 Speculative Decoding for LLM Inference (5.6x speedup)
(github.com/SafeAILab)
2 points
summarity
a year ago
discuss
532.
▲
Show HN: Kernel-level LLM inference via /dev/llm0
(github.com/randombk)
2 points
RandomBK
a year ago
discuss
533.
▲
Rust Type Inference Broke with Update to Deranged Crate
(github.com/jhpratt)
2 points
nethunters
a year ago
discuss
534.
▲
DeepDive: In-Depth Decryption of LLMs Construction and Inference from Scratch
(github.com/therealoliver)
2 points
therealoliver
a year ago
discuss
535.
▲
Show HN: OptiLLMBench – Test how inference optimization tricks scale up LLMs
2 points
codelion
a year ago
discuss
536.
▲
Deepseek.cpp: CPU inference for the DeepSeek family of LLMs in pure C++
(github.com/andrewkchan)
2 points
hedgehog0
a year ago
discuss
537.
▲
Jlama: LLM Inference Engine for Java
(github.com/tjake)
2 points
saikatsg
a year ago
discuss
538.
▲
Show HN: EmbedAnything – Rust Powered Inference, Ingestion and Indexing
(github.com/StarlightSearch)
2 points
Sonam_AI
a year ago
discuss
539.
▲
JetStream: Throughput+memory optimized engine for LLM inference on XLA devices
(github.com/google)
2 points
lnyan
2 years ago
discuss
540.
▲
Duck-Lisp: optional free-form parenthesis inference
(github.com/oitzujoey)
2 points
nemoniac
2 years ago
discuss
More