I wanted a library that is fast and feature rich enough to train actual models while being simple enough that anyone with a bit of python experience can understand what is going on.
The biggest milestone so far is training GPT-2 (124M) on 2.3B tokens in just under 3 days on my GPU (RTX 3090).
So far, I've added the following to Tricycle:
- An automatic differentiation engine
- General matrix operations with einsum
- Standard network layers (Dense, ReLU, GeLU etc)
- Transformer blocks (MultiHeadSelfAttention and MLP blocks)
- Optimisers (SGD, AdamW)
- GPT-2
- etc
The project is still under active development, I'm in the process of adding mixed precision and multi-gpu support with the goal of scaling up to larger models.
To see it in action, the best place to start is train_smol_gpt.py which will train GPT-2 from scratch.
Let me know what you think!