Following the Training Guide and the video by Thorsten Müller.
https://github.com/rhasspy/piper/blob/master/TRAINING.md https://www.youtube.com/watch?v=b_we_jma220
Data: Single Speaker, 18,000 files, average length 3 seconds, Sample Rate 22050, LJ Speech Format
Batch Size 32, Number of Epochs 10000, Precision 32, Quality High
Training is resumed from the Lessac High Quality Voice Checkpoint on Hugging Face.
https://huggingface.co/datasets/rhasspy/piper-checkpoints/tree/main/en/en_US/lessac/high
When running on regular CPU, facing challenges with OOM program exits. The free T4 GPU on Google Colab is not always available. Even when it is, it takes a long time to run through 1 epoch.
Trying to get an estimate of how many and what type of GPUs I can rent on Lambda Labs and how long it would take to run an epoch.
I have also read that to get a good quality clone on models like Piper and Tacotron need 100K steps (steps = batch size * number of epochs, so 32 batch size * 10000 epochs would be 320,000 steps); any advice there as well would be appreciated, thanks.