Show HN: Jax and Flax LLMs – Transformer Implementations Optimized for TPUs

3 points

a year ago

I've open-sourced awesome-jax-flax-llms, a curated collection of large language model (LLM) implementations built from scratch using JAX and Flax. The repo is designed for high-performance training on TPUs/GPUs, making it ideal for researchers, ML engineers, and curious tinkerers looking to explore or extend modern transformer models.

Key Features:

Modular, readable, and extensible codebase Implementations of GPT-2 and LLaMA 3 in pure JAX/Flax Accelerated training with XLA + Optax Google Colab support (TPU-ready) Hugging Face dataset integration Upcoming support for fine-tuning, Mistral, and DeepSeek-R This is primarily an educational resource, but it's written with performance in mind and can be adapted for more serious use. Contributions are welcome — whether you’re improving performance, adding new models, or experimenting with different attention mechanisms.