Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Login
Top
New
Best
Ask
Show
Jobs
631.
▲
High Performance LLM Inference Operator Library from Tencent
(github.com/Tencent)
1 point
polyrand
4 months ago
discuss
632.
▲
Show HN: ResourceAI – Local LLM inference optimized for consumer iGPUs
1 point
Fenix46
4 months ago
discuss
633.
▲
Show HN: VelinScript 3.0 – eine neue Sprache MIT bidirektionaler Type‑Inference
(github.com/SkyliteDesign)
1 point
SkyliteDesign
5 months ago
discuss
634.
▲
Fast_topk_batched: High-performance batched Top-K selection for CPU inference
(github.com/RAZZULLIX)
1 point
thunderbong
5 months ago
discuss
635.
▲
Show HN: Adaptive-K – Cut MoE inference costs 30-50% with entropy-guided routing
(github.com/Gabrobals)
1 point
Gabrielebalsamo
5 months ago
discuss
636.
▲
Inference-Time Constitutional AI
(github.com/mdiskint)
1 point
mdiskint37
5 months ago
discuss
637.
▲
WeDLM Reconciling Diff Lang Models with Std Causal Attention for Fast Inference
(github.com/Tencent)
1 point
LoveMortuus
5 months ago
discuss
638.
▲
Show HN: Binfer, an experimental LLM inference engine in TypeScript and CUDA
(github.com/bwasti)
1 point
brrrrrm
6 months ago
discuss
639.
▲
TileRT: Tile-Based Runtime for Ultra-Low-Latency LLM Inference
(github.com/tile-ai)
1 point
simonpure
7 months ago
discuss
640.
▲
Pure Go hardware accelerated local inference on VLMs using llama.cpp
(github.com/hybridgroup)
1 point
deadprogram
7 months ago
discuss
641.
▲
Show HN: Serverless platform for inference of time-series foundation models
(faim.it.com)
1 point
ChernovAndrei
7 months ago
discuss
642.
▲
LitServe: Build custom AI inference engines
(github.com/Lightning-AI)
1 point
wfalcon
7 months ago
discuss
643.
▲
Yzma = embedding+inference on VLM/LLM/SLM/TLM in pure Go using llama.cpp
(github.com/hybridgroup)
1 point
deadprogram
8 months ago
discuss
644.
▲
Build your own AI model inference engines
(github.com/Lightning-AI)
1 point
wfalcon
8 months ago
discuss
645.
▲
Open Retrieval-Based Inference Toolkit
(github.com/schmitech)
1 point
schmitech
10 months ago
discuss
646.
▲
Pydantic/GenAI-prices – Calculate prices for calling LLM inference APIs
(github.com/pydantic)
1 point
alexmorley
10 months ago
discuss
647.
▲
Show HN: Pure CUDA C Inference for Qwen3 0.6B in One File, No Dependencies
(github.com/gigit0000)
1 point
yb0000
10 months ago
discuss
648.
▲
Confidential AI Inference with Attestation: Run LLMs and Agents on Tees
(github.com/nearai)
1 point
transpute
a year ago
discuss
649.
▲
Ask HN: What Inference Server do you use to host TTS Models?
1 point
samagra14
a year ago
discuss
650.
▲
ArtificialCast: Type-safe transformation powered by inference
(github.com/Zorokee)
1 point
mpweiher
a year ago
discuss
651.
▲
A collection of reproducible LLM inference engine benchmarks: SGLang vs. vLLM
(github.com/Michaelvll)
1 point
zhwu
a year ago
discuss
652.
▲
The Path to Open-Sourcing the DeepSeek Inference Engine
(github.com/deepseek-ai)
1 point
xnhbx
a year ago
discuss
653.
▲
Show HN: SQL-based inference for Gradient Boosting Models
(github.com/mattismegevand)
1 point
mattismegevand
a year ago
discuss
654.
▲
Show HN: Acord – A Daemon for AI Inference
(github.com/alpaca-core)
1 point
bstanimirov
a year ago
discuss
655.
▲
Cost-efficient and pluggable Infrastructure components for GenAI inference
(github.com/vllm-project)
1 point
rrampage
a year ago
discuss
656.
▲
Cost-efficient and pluggable Infrastructure components for GenAI inference
(github.com/vllm-project)
1 point
delduca
a year ago
discuss
657.
▲
Show HN: TokenFlow – Visualize LLM inference speed
(dave.ly)
1 point
davely
a year ago
discuss
658.
▲
Show HN: Bodhi App – Local LLM Inference
(getbodhi.app)
1 point
anagri
a year ago
discuss
659.
▲
CUDA/Metal accelerated language model inference
(github.com/zeux)
1 point
mooreds
a year ago
discuss
660.
▲
Computer vision models inference directly on mobile
(github.com/software-mansion)
1 point
mrys
a year ago
discuss
More