We're using VOSK for speech recognition and VOICEVOX for Japanese speech synthesis. You can easily swap out the speech synthesis engine if needed. The robot is open source, and we're working on a kit version for easier assembly.
We've integrated ChatGPT by feeding speech recognition results into it and providing Stack-chan's character settings. The codebase is implemented using the JavaScript platform "Moddable," and you can find the ChatGPT API integration here: https://github.com/meganetaaan/stack-chan/blob/dev/v1.0/firm....
Though ChatGPT is designed for question-and-answer dialogues, we're exploring ways to improve turn-taking for more natural conversations. We're excited about Stack-chan's new conversational abilities and welcome any interesting use case ideas you have!