Heykuki News

TopNewBestAskShowJobs
TopNewBestAskShowJobs
61.
Show HN: Parakeet LLM Demo (378M param. 8GB VRAM)
2 points
razodactyl
2 years ago
discuss
62.
Adjust VRAM/RAM Split on Apple Silicon (github.com/ggerganov)
1 point
tosh
3 years ago
1 comment
63.
2.3x KV Cache Compression at 32k Context – Cut VRAM Costs by 50% (github.com/Jamie2111)
1 point
JamieObala
21 days ago
discuss
64.
Show HN: QKV Core – Run 7B LLMs on 4GB VRAM via surgical memory alignment (github.com/QKV-Core)
1 point
broxytr
6 months ago
discuss
65.
Super Merryo Trolls: An Adventure from the Days Before VRAM (github.com/GBirkel)
1 point
vatys
2 years ago
discuss
66.
Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks (github.com/antoinezambelli)
686 points
zambelli
17 days ago
252 comments
67.
Show HN: InvokeAI, an open source Stable Diffusion toolkit and WebUI (github.com/invoke-ai)
414 points
sophrocyne
4 years ago
102 comments
68.
Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training (github.com/alainnothere)
265 points
xlayn
3 months ago
80 comments
69.
Launch HN: Deepsilicon (YC S24) – Software and hardware for ternary transformers
189 points
areddyyt
2 years ago
79 comments
70.
Show HN: ZSE – Open-source LLM inference engine with 3.9s cold starts (github.com/Zyora-Dev)
58 points
zyoralabs
3 months ago
9 comments
71.
Show HN: I built a RISC-V emulator that runs DOOM (github.com/lalitshankarch)
50 points
Flex247A
a month ago
4 comments
72.
Show HN: Local task classifier and dispatcher on RTX 3080 (github.com/resilientworkflowsentinel)
26 points
Shubham_Amb
4 months ago
2 comments
73.
Show HN: KTransformers–236B Model and 1M Context LLM Inference on Local Machines (github.com/kvcache-ai)
20 points
sssummer
2 years ago
3 comments
74.
Show HN: Finetune Llama-3.1 2x faster in a Colab (colab.research.google.com)
16 points
danielhanchen
2 years ago
2 comments
75.
Show HN: Salad, a distributed cloud for AI (like Airbnb for GPUs)
15 points
bobjmiles
2 years ago
4 comments
76.
Show HN: KTransformers:671B DeepSeek-R1 on a Single Machine-286 tokens/s Prefill (github.com/kvcache-ai)
14 points
sssummer
a year ago
discuss
77.
Show HN: Willow Inference Server: Optimized ASR/TTS/LLM for Willow/WebRTC/REST (github.com/toverainc)
13 points
kkielhofner
3 years ago
13 comments
78.
Show HN: Lightweight Llama3 Inference Engine – CUDA C (github.com/abhisheknair10)
12 points
abhisheknair10
a year ago
discuss
79.
Show HN: Automatic 1111, but as a Python Package (github.com/saketh12)
11 points
saketh105
2 years ago
discuss
80.
Show HN: Coderive – Iterating through 1 Quintillion Inside a Loop in just 50ms (github.com/DanexCodr)
8 points
DanexCodr
5 months ago
13 comments
81.
Show HN: onprem unstructured data extraction with 4 lines of code (github.com/NanoNets)
8 points
souvik3333
a year ago
discuss
82.
Show HN: Local GLaDOS (old.reddit.com)
8 points
dnhkng
2 years ago
discuss
83.
Show HN: WaveletLM – wavelet-based, attention-free model with O(n log n) scaling (github.com/ramongougis)
7 points
anarmorarm
a month ago
1 comment
84.
Show HN: Serve 100 Large AI models on a single GPU with low impact to TTFT (github.com/leoheuler)
7 points
leonheuler
7 months ago
1 comment
85.
Show HN: Federation of robots collaboratively train an object manipulation model (github.com/adap)
7 points
jafermarq
a year ago
discuss
86.
Show HN: Chonkie Cloud – No-nonsense chunking now on the the cloud (cloud.chonkie.ai)
6 points
snyy
a year ago
5 comments
87.
Show HN: OS Megakernel that match M5 Max Tok/w at 2x the Throughput on RTX 3090 (github.com/Luce-Org)
6 points
GreenGames
2 months ago
1 comment
88.
Show HN: Blink-Edit – Cursor-style next-edit predictions for Neovim (local LLMs) (github.com/BlinkResearchLabs)
6 points
atemyipod
4 months ago
discuss
89.
Show HN: I'm tired of my LLM bullshitting. So I fixed it
5 points
BobbyLLM
4 months ago
9 comments
90.
Show HN: AI Council – multi-model deliberation that runs in the browser (github.com/prijak)
5 points
prijak
3 months ago
1 comment
More