We've open-sourced Fish Audio S2, a new generation of expressive TTS that lets you direct voices with natural language. Add cues like [whisper] or [laughing nervously], generate multi-speaker dialogue in one pass, and create scary-real voices across 80+ languages.
Fish Audio S1 is the most expressive and emotionally rich TTS model—creating lifelike voices that capture emotion, rhythm, and nuance. Clone any voice in 10 seconds, preserving accent, tone, and speaking habits with unmatched realism.
Your Voice, Your Way: Open-Source TTS
Powerful, fast, and natural speech in any language. Clone voices instantly. Self-host or use our service. Lightning-fast, affordable pricing.
With just 15 seconds of any voice, Fish Speech can reliably synthesize natural and fluent speech while maintaining the given timbre, style, and accent. Our open-source team, creators of So-VITS-SVC and Bert-VITS2, proudly introduces Fish Speech.