Notes on training BERT from scratch on an 8GB consumer GPU | Heykuki News