Hey HN! After using a combination of Unsloth and Axolotl a lot, and finding it generally painful to figure out the right performance tuning for things like batch sizing and multi-GPU sharding, I wrote a small Python lib that sets up known-good LoRA training configurations for Llama 3.1 8B and 70B Instruct, and includes helpers for distilling from larger models or training on serverless finetuning platforms, and includes a walkthrough for distilling DeepSeek-R1 into a Llama 3.1 8B LoRA... But you can use it for pretty much any finetuning task, not just distilling large models!
Show HN: Unfat, a library to easily train and distill LoRAs for LLMs | Heykuki News