Heykuki News

TopNewBestAskShowJobs
TopNewBestAskShowJobs
601.
ParaAttention: Speed Up Flux and Mochi Inference with Multiple GPUs (github.com/chengzeyi)
1 point
chengzeyi
2 years ago
1 comment
602.
Llama Deck:CLI for running multiple language implementations of LLM inference (github.com/xxxbf0222)
1 point
mikepapadim
2 years ago
1 comment
603.
Benchmarked Llama2 and mistral across popular inference engines and precisions (github.com/premAI-io)
1 point
anindya2002
2 years ago
1 comment
604.
Curated List of 50 Open-Source LLM Inference Tools: Seeking Contributions (github.com/vince-lam)
1 point
vincelam
2 years ago
1 comment
605.
Show HN: Fortran inference code for the Mamba state space language model (github.com/rbitr)
1 point
andy99
2 years ago
1 comment
606.
GPT-Fast: Simple and efficient GPT inference in <1000 LOC of Python (github.com/pytorch-labs)
1 point
Palmik
3 years ago
1 comment
607.
Generate Nix packages from URLs with hash prefetching and dependency inference (github.com/nix-community)
1 point
figsoda
3 years ago
1 comment
608.
Show HN: Kylo – Simple FAQ Bot Built with Facebook's Infersent (github.com/avinassh)
1 point
avinassh
7 years ago
1 comment
609.
Clevr-Iep: Inferring and Executing Programs for Visual Reasoning (github.com/facebookresearch)
1 point
runesoerensen
9 years ago
1 comment
610.
XcodeGhost infectd Apps List (github.com/zengyun-programmer)
1 point
dengjh
11 years ago
1 comment
611.
Configurable zombie infection simulation (github.com/Ellzord)
1 point
javinpaul
11 years ago
discuss
612.
Why Gemma-4 26B MoE works in HuggingFace but breaks in prod inference engines (github.com/maeddesg)
1 point
maeddesg
23 days ago
discuss
613.
Show HN: Gosd: High-performance Stable Diffusion inference in pure Go(no CGO) (github.com/l8bloom)
1 point
krakato
a month ago
discuss
614.
WebLLM is a high-performance in-browser LLM inference engine (github.com/mlc-ai)
1 point
doener
a month ago
discuss
615.
Rcarmo/gte-go: Golang inference for the GTE Small embedding model (github.com/rcarmo)
1 point
rcarmo
a month ago
discuss
616.
Show HN: JibarOS, a shared inference runtime for Android (github.com/Jibar-OS)
1 point
rafaelvalle03
2 months ago
discuss
617.
ORAC-NT MedChem Copilot that blocks synthetically infeasible molecules (github.com/Kretski)
1 point
DREDREG
2 months ago
discuss
618.
New ML inference language dropped today (github.com/m0at)
1 point
sfffs
2 months ago
discuss
619.
QuantumLeap: 2.3× faster MoE inference with intelligent expert caching (github.com/MartinCrespoC)
1 point
ikharoz
2 months ago
discuss
620.
Show HN: Mamba SSM in Rust – training and inference with custom CUDA kernels (github.com/silvermpx)
1 point
silvermpx
3 months ago
discuss
621.
Show HN: Go LLM inference with a Vulkan GPU back end that beats Ollama's CUDA (github.com/computerex)
1 point
computerex
3 months ago
discuss
622.
Speculative Speculative Decoding: Really, Really Fast LLM Inference (github.com/tanishqkumar)
1 point
fizzbuzz07
3 months ago
discuss
623.
Show HN: SAM 3 Inference on Modal in Under 10 Seconds (github.com/TheFloatingString)
1 point
larryll
3 months ago
discuss
624.
Show HN: oMLX – Native Mac inference server that persists KV cache to SSD (github.com/jundot)
1 point
jundot
4 months ago
discuss
625.
MicroGPT: train & inference in 243 lines of code (gist.github.com)
1 point
RyanShook
4 months ago
discuss
626.
MicroGPT - Train and inference a GPT in pure, dependency-free Python (200 lines) (gist.github.com)
1 point
susam
4 months ago
discuss
627.
Show HN: ARIA Protocol – P2P distributed 1-bit LLM inference at 120 tok/s on CPU (github.com/spmfrance-cloud)
1 point
anthonymu
4 months ago
discuss
628.
Show HN: ARIA – P2P distributed inference protocol for 1-bit LLMs on CPU (github.com/spmfrance-cloud)
1 point
anthonymu
4 months ago
discuss
629.
Show HN: Weed–Minimalist AI/ML inference and backprogation in the style of Qrack (github.com/vm6502q)
1 point
wrathfulspatula
4 months ago
discuss
630.
PowerInfer: Fast LLM Inference on a Consumer-Grade GPU (github.com/Tiiny-AI)
1 point
oldfuture
4 months ago
discuss
More