The absolute best project I've found thus far in terms of output is https://github.com/neonbjb/tortoise-tts but it A) requires an enormous amount of GPU horsepower and B) even with that, a single autoregressive sample, whose text input is less than the length of a tweet, takes 5-10 minutes of compute time to produce.
So my question is, what projects have you all come across that strike that balance between good enough quality and fast time to result?