Check out the samples from their announcement blog article: https://openai.com/blog/chatgpt-can-now-see-hear-and-speak
Does anyone have an idea of what tech stack they're using to generate it? They indicate they hired professional voice actors to do the training. What about the software stack?