Fish Audio S2 - Real Expressive AI Voices
by•
We've open-sourced Fish Audio S2, a new generation of expressive TTS that lets you direct voices with natural language. Add cues like [whisper] or [laughing nervously], generate multi-speaker dialogue in one pass, and create scary-real voices across 80+ languages.

Replies
Trufflow
Really cool how fast it can be to clone my voice. Should I be giving it multiple recordings at different emotions so that it has a better register of what I sound like?
Fish Audio
@lienchueh You absolutely can! Just ten seconds of high quality audio recording of your voice with a good mic will take you most of the way there though. With the new open domain emotion tags you can direct emotions in the speech with precision.
Just found fish audio this year and was surprised about the API and the S1 model. Well, the S2 is now absolutely mind-blowing. Great work!
Fish Audio
@michael_pohl Awesome to hear Michael, thank you!
Fish Audio
https://x.com/i/trending/2031460658311737490
As a content creator - I've been looking for a product like this for a long time! Hope it'll match my expectations.
Fish Audio
@yotam_dahan i think fish s2 would be the best for content creators! excited for you to try it, let us know what you think :)
Sway
Fish Audio
@christian73 Thank you so much Christian!
Runner AI
Fish Audio is hands down one of the most impressive TTS tools I've come across. I fed it a short clip and the output genuinely sounded like me. You can make your cloned voice whisper, laugh, get excited — it's funny and a little surreal hearing yourself say things in ways you never actually did . Can't believe this is open source. Great stuff, keep it up!
Sounds promising, will check out, hope to find the one product in this niche, which finally works :)
As someone who used to lead a team that created dozens of voice overs for different market, these tools are a game-changer.
Congrats on the launch! 🎉
The focus on emotion and nuance in TTS is really interesting. A lot of voice models sound technically good but still feel a bit flat, so the idea of capturing rhythm and speaking habits is compelling.
Also impressive that voice cloning works with just ~10 seconds of audio. Curious how you’re handling consent and voice ownership safeguards as this gets adopted more widely?
FunBlocks MindMax
Congrats on this launch!