Of these models, I am most impressed with Tortoise. It's a near-perfect audio deepfake.
I'm playing with making this into an app/service where you can select your favorite podcasts and get an RSS feed. I'm not sure how to make that sustainable since the compute requirements are hefty. I am very open to feedback!