This allows folks to run via a single API call - it costs $0.03/query. The WAV file is downloadable, we apply no restrictions.
We're open-sourcing all our work — we made Tortoise run 30% faster, and have more improvements coming. If you're keen to contribute we can help with ideas, pointers, compute and data; just DM us. Our fork with the improvements can be found at https://github.com/metavoicexyz/tortoise-tts. The deployment code can be found at https://github.com/metavoicexyz/tortoise-tts-modal-api.
There are already great alternatives for using : i) @mdnest_r's awesome Huggingface Spaces, ii) original Google Colab, iii) host it yourself. Our work should accelerate those who need an API, don't want to spend time/$ hosting and need a scalable infra backing them.
We're especially excited about combining text-to-speech with content generated from LLMs, and about how it fits into video creation tools.
Tortoise in its current form is also inaccessible to non-technical users, which is why we are also providing a simple UI on top (also "at-cost"): https://tts.themetavoice.xyz
To use, generate an API key on https://tts.themetavoice.xyz and call via POST request. Or use the web UI. Or run your own deployment.