The result is a C++ re-implementation of Andrej Karpathy's nanochat's inferencing part (https://github.com/karpathy/nanochat), built on top of ggml. Unlike llama.cpp, this isn't a standalone binary; it is a C++ library & Python wrapper designed to swap out some core classes within the nanochat pipeline. For playability, I’ve tried to keep the dependencies to a minimum: just ggml, nanobind, and gtest for unit tests.
Features and limitations:
- A drop-in replacement of nanochat’s `GPT` and `KVCache` classes. So far I’ve only tested this with `chat_web.py`. You can see how it's integrated here: https://github.com/k-ye/nanochat/pull/1
- Supports CPU and GPU (Metal yes, CUDA probably?).
- Handles PyTorch-to-GGUF conversion automatically on demand.
- Only float32 is currently supported.
Benchmark:
On an M3 Max (Metal), throughput is roughly 1/3 that of the original PyTorch implementation. I haven’t profiled the code yet, but I suspect the bottleneck is the lack of bf16 support.
Motivation
- Writing meaningful (& fun) C++ again: I used to spend a lot of my day-to-day time in C++ while working at various tech companies. These days, opportunities to use it for personal projects are rare, as it’s often hard to find a use case where C++'s advantages truly matter.
- Testing "Vibe Coding" capabilities: Most of my current work is in UE5. Ironically, Blueprints—which were designed to help non-coders—have become a bottleneck in the LLM era... Admittedly, the AI agent has generated some FOMO in me, and I wanted to see if AI could handle a lower-level C++ implementation of a complex system from scratch.
- Understanding the LLM internals.
Why nanochat?
It hits the "Goldilocks" zone: popular enough to be relevant, concise enough to be educational, and practical enough to deserve a serious C++ implementation.
If you’re like me — an infra guy from the old days who feels a bit threatened by LLM and/or AI coding — I think nanochat is a great reference. Tinkering with it however you like is a nice way to demystify the tech. I relied heavily on Claude Code (CC) for the implementation. Overall, I am both impressed and genuinely pleased with the experience.
Happy to answer questions, hear feedback or further discuss AI coding!