Heykuki News

TopNewBestAskShowJobs
TopNewBestAskShowJobs
631.
High Performance LLM Inference Operator Library from Tencent (github.com/Tencent)
1 point
polyrand
4 months ago
discuss
632.
Show HN: ResourceAI – Local LLM inference optimized for consumer iGPUs
1 point
Fenix46
4 months ago
discuss
633.
Show HN: VelinScript 3.0 – eine neue Sprache MIT bidirektionaler Type‑Inference (github.com/SkyliteDesign)
1 point
SkyliteDesign
5 months ago
discuss
634.
Fast_topk_batched: High-performance batched Top-K selection for CPU inference (github.com/RAZZULLIX)
1 point
thunderbong
5 months ago
discuss
635.
Show HN: Adaptive-K – Cut MoE inference costs 30-50% with entropy-guided routing (github.com/Gabrobals)
1 point
Gabrielebalsamo
5 months ago
discuss
636.
Inference-Time Constitutional AI (github.com/mdiskint)
1 point
mdiskint37
5 months ago
discuss
637.
WeDLM Reconciling Diff Lang Models with Std Causal Attention for Fast Inference (github.com/Tencent)
1 point
LoveMortuus
5 months ago
discuss
638.
Show HN: Binfer, an experimental LLM inference engine in TypeScript and CUDA (github.com/bwasti)
1 point
brrrrrm
6 months ago
discuss
639.
TileRT: Tile-Based Runtime for Ultra-Low-Latency LLM Inference (github.com/tile-ai)
1 point
simonpure
7 months ago
discuss
640.
Pure Go hardware accelerated local inference on VLMs using llama.cpp (github.com/hybridgroup)
1 point
deadprogram
7 months ago
discuss
641.
Show HN: Serverless platform for inference of time-series foundation models (faim.it.com)
1 point
ChernovAndrei
7 months ago
discuss
642.
LitServe: Build custom AI inference engines (github.com/Lightning-AI)
1 point
wfalcon
7 months ago
discuss
643.
Yzma = embedding+inference on VLM/LLM/SLM/TLM in pure Go using llama.cpp (github.com/hybridgroup)
1 point
deadprogram
8 months ago
discuss
644.
Build your own AI model inference engines (github.com/Lightning-AI)
1 point
wfalcon
8 months ago
discuss
645.
Open Retrieval-Based Inference Toolkit (github.com/schmitech)
1 point
schmitech
10 months ago
discuss
646.
Pydantic/GenAI-prices – Calculate prices for calling LLM inference APIs (github.com/pydantic)
1 point
alexmorley
10 months ago
discuss
647.
Show HN: Pure CUDA C Inference for Qwen3 0.6B in One File, No Dependencies (github.com/gigit0000)
1 point
yb0000
10 months ago
discuss
648.
Confidential AI Inference with Attestation: Run LLMs and Agents on Tees (github.com/nearai)
1 point
transpute
a year ago
discuss
649.
Ask HN: What Inference Server do you use to host TTS Models?
1 point
samagra14
a year ago
discuss
650.
ArtificialCast: Type-safe transformation powered by inference (github.com/Zorokee)
1 point
mpweiher
a year ago
discuss
651.
A collection of reproducible LLM inference engine benchmarks: SGLang vs. vLLM (github.com/Michaelvll)
1 point
zhwu
a year ago
discuss
652.
The Path to Open-Sourcing the DeepSeek Inference Engine (github.com/deepseek-ai)
1 point
xnhbx
a year ago
discuss
653.
Show HN: SQL-based inference for Gradient Boosting Models (github.com/mattismegevand)
1 point
mattismegevand
a year ago
discuss
654.
Show HN: Acord – A Daemon for AI Inference (github.com/alpaca-core)
1 point
bstanimirov
a year ago
discuss
655.
Cost-efficient and pluggable Infrastructure components for GenAI inference (github.com/vllm-project)
1 point
rrampage
a year ago
discuss
656.
Cost-efficient and pluggable Infrastructure components for GenAI inference (github.com/vllm-project)
1 point
delduca
a year ago
discuss
657.
Show HN: TokenFlow – Visualize LLM inference speed (dave.ly)
1 point
davely
a year ago
discuss
658.
Show HN: Bodhi App – Local LLM Inference (getbodhi.app)
1 point
anagri
a year ago
discuss
659.
CUDA/Metal accelerated language model inference (github.com/zeux)
1 point
mooreds
a year ago
discuss
660.
Computer vision models inference directly on mobile (github.com/software-mansion)
1 point
mrys
a year ago
discuss
More