
Muyan-TTS
Open-source, high-quality TTS for podcasts & voice cloning
3 followers
Open-source, high-quality TTS for podcasts & voice cloning
3 followers
Muyan-TTS is an open-source TTS for podcasts, trained on 100k+ hours of audio. Offers high-quality zero-shot voice generation & speaker adaptation with minutes of speech.




Flowtica Scribe
Hi everyone!
There's a new open-source text-to-speech model out called Muyan-TTS, from the MYZY-AI team, and it's specifically designed with podcast applications in mind.
What's notable is that Muyan-TTS was pre-trained on over 100,000 hours of podcast audio. This allows it to generate high-quality voices zero-shot, meaning it can use a short audio sample to generate speech in that voice without new training. For more customized voices, the fine-tuned version (Muyan-TTS-SFT) can adapt to a specific speaker with just dozens of minutes of their audio. They've also been transparent about their development, mentioning it was built within a ~$50k budget.
The models (both base zero-shot and the SFT version for speaker adaptation) and training code are all released.