Search: github.com/vrza | Heykuki News

Heykuki News

Top New Best Ask Show Jobs

Top New Best Ask Show Jobs

61.

Show HN: Parakeet LLM Demo (378M param. 8GB VRAM)

2 points

2 years ago

62.

Adjust VRAM/RAM Split on Apple Silicon (github.com/ggerganov)

1 point

3 years ago

63.

2.3x KV Cache Compression at 32k Context – Cut VRAM Costs by 50% (github.com/Jamie2111)

1 point

21 days ago

64.

Show HN: QKV Core – Run 7B LLMs on 4GB VRAM via surgical memory alignment (github.com/QKV-Core)

1 point

6 months ago

65.

Super Merryo Trolls: An Adventure from the Days Before VRAM (github.com/GBirkel)

1 point

2 years ago

66.

Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks (github.com/antoinezambelli)

686 points

17 days ago

67.

Show HN: InvokeAI, an open source Stable Diffusion toolkit and WebUI (github.com/invoke-ai)

414 points

4 years ago

68.

Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training (github.com/alainnothere)

265 points

3 months ago

69.

Launch HN: Deepsilicon (YC S24) – Software and hardware for ternary transformers

189 points

2 years ago

70.

Show HN: ZSE – Open-source LLM inference engine with 3.9s cold starts (github.com/Zyora-Dev)

58 points

3 months ago

71.

Show HN: I built a RISC-V emulator that runs DOOM (github.com/lalitshankarch)

50 points

a month ago

72.

Show HN: Local task classifier and dispatcher on RTX 3080 (github.com/resilientworkflowsentinel)

26 points

4 months ago

73.

Show HN: KTransformers–236B Model and 1M Context LLM Inference on Local Machines (github.com/kvcache-ai)

20 points

2 years ago

74.

Show HN: Finetune Llama-3.1 2x faster in a Colab (colab.research.google.com)

16 points

2 years ago

75.

Show HN: Salad, a distributed cloud for AI (like Airbnb for GPUs)

15 points

2 years ago

76.

Show HN: KTransformers:671B DeepSeek-R1 on a Single Machine-286 tokens/s Prefill (github.com/kvcache-ai)

14 points

a year ago

77.

Show HN: Willow Inference Server: Optimized ASR/TTS/LLM for Willow/WebRTC/REST (github.com/toverainc)

13 points

3 years ago

78.

Show HN: Lightweight Llama3 Inference Engine – CUDA C (github.com/abhisheknair10)

12 points

a year ago

79.

Show HN: Automatic 1111, but as a Python Package (github.com/saketh12)

11 points

2 years ago

80.

Show HN: Coderive – Iterating through 1 Quintillion Inside a Loop in just 50ms (github.com/DanexCodr)

8 points

5 months ago

81.

Show HN: onprem unstructured data extraction with 4 lines of code (github.com/NanoNets)

8 points

a year ago

82.

Show HN: Local GLaDOS (old.reddit.com)

8 points

2 years ago

83.

Show HN: WaveletLM – wavelet-based, attention-free model with O(n log n) scaling (github.com/ramongougis)

7 points

a month ago

84.

Show HN: Serve 100 Large AI models on a single GPU with low impact to TTFT (github.com/leoheuler)

7 points

7 months ago

85.

Show HN: Federation of robots collaboratively train an object manipulation model (github.com/adap)

7 points

a year ago

86.

Show HN: Chonkie Cloud – No-nonsense chunking now on the the cloud (cloud.chonkie.ai)

6 points

a year ago

87.

Show HN: OS Megakernel that match M5 Max Tok/w at 2x the Throughput on RTX 3090 (github.com/Luce-Org)

6 points

2 months ago

88.

Show HN: Blink-Edit – Cursor-style next-edit predictions for Neovim (local LLMs) (github.com/BlinkResearchLabs)

6 points

4 months ago

89.

Show HN: I'm tired of my LLM bullshitting. So I fixed it

5 points

4 months ago

90.

Show HN: AI Council – multi-model deliberation that runs in the browser (github.com/prijak)

5 points

3 months ago