Giving Fish a try vs. other speech apps (11Labs, HeyGen, etc.). Should the tags work on the demo version? Read your doc, but they do not seem to work when I generate a file. The cloning was quite good, even in the demo version, which has some limitations.
Fish Audio
Hi our beloved PH!
[excited] [slightly nervous]
Today we’re launching Fish Audio S2, our new text-to-speech model.
[long pause]
Hear Fish S2 Read This!
This is a big step beyond S1, redefining expressive voice AI. Write emotion cues anywhere in the text and hear the speech flow exactly how [emphasis] YOU direct it.
And, [inhale] we’re open-sourcing all of it.
GitHub: https://github.com/fishaudio/fish-speech/
HuggingFace: https://huggingface.co/fishaudio/s2-pro/
Shout out to SGLang for powering our stack.
There’s much more to S2.
Try it yourself now: https://fish.audio/s2/
As always, we want to give back to the community. For the launch, we’re offering free generation credits and an exclusive 50% OFF promo code: PH-FishS2
Go build weird things with it :)
We’d love to hear what you make.
Fish Audio
@hehe6z incredibly proud of this one, amazing job team!
Fish Audio
@rissa_cao teamwork 👾
How does Fish Audio maintain consistent emotional prosody and rhythmic nuance across long-form content, and what specific architectural improvements over So-VITS-SVC allow for such high-fidelity cloning from only 10 seconds of source audio?
Fish Audio
@mordrag great question Denis! S2 moves beyond systems like So-VITS-SVC and instead generates speech with a large speech-language model that operates on discrete audio tokens, which lets it maintain the traits over long passages. because S2 is heavily pretrained on large-scale speech data, the reference clip mainly anchors speaker identity and style, so it can clone voices extremely well from just 15 seconds of sample audio.
Trufflow
Really cool how fast it can be to clone my voice. Should I be giving it multiple recordings at different emotions so that it has a better register of what I sound like?
Fish Audio
@lienchueh You absolutely can! Just ten seconds of high quality audio recording of your voice with a good mic will take you most of the way there though. With the new open domain emotion tags you can direct emotions in the speech with precision.
Calling Clones
Can I use this in a raspberri pi voice assistant that I have at home?
What abour the voice cloning to use it in phone calls?
eleven labs is not that good.. ( or I dont know how to set it up)
Fish Audio
@javierfandos Hi Javi, this is a great point - yes you absolutely can! For example home-assistant has direct fish audio support, you can check out the deets here: https://www.home-assistant.io/integrations/fish_audio/. Voice cloning is also one of the flagship features our users love because of the extreme realism :)
Calling Clones
I'm lauching something soon! I need to find somenthing! Will take a look! dankeee
Fish Audio
@javierfandos that's awesome looking forward to your launch!!
Calling Clones
@hehe6z WOW! just cloned my voice. its actually better than eleven labs!
What's the basis for the tonation or emphasis? Congrats on the launch, @hehe6z!
Fish Audio
@hehe6z @neilverma S2 is trained on over 10 million hours of audio with reinforcement learning and a dual-autoregressive architecture. Tones, emphasis, pauses, laughs, and other emotions can all be used in natural language emotion tags placed at any word or phrase positions within the text! Thank you for your support Neil!
Fish Audio
@neilverma thank you Neil!!
Excited to see the new version coming! Will it support any new languages?
Fish Audio
@vladimir_osipov Thank you Vladimir! Yeah the language support has expanded significantly compared to S1. S2 Pro supports 80+ languages.
Tier 1: Japanese (ja), English (en), Chinese (zh)
Tier 2: Korean (ko), Spanish (es), Portuguese (pt), Arabic (ar), Russian (ru), French (fr), German (de)
Other supported languages: sv, it, tr, no, nl, cy, eu, ca, da, gl, ta, hu, fi, pl, et, hi, la, ur, th, vi, jw, bn, yo, sl, cs, sw, nn, he, ms, uk, id, kk, bg, lv, my, tl, sk, ne, fa, af, el, bo, hr, ro, sn, mi, yi, am, be, km, is, az, sd, br, sq, ps, mn, ht, ml, sr, sa, te, ka, bs, pa, lt, kn, si, hy, mr, as, gu, fo, and more.
Klariqo AI Voice Assistants
Oh my this is mind blowing. Does it support streaming on self hosted?
Fish Audio
@ansh_deb Oh hey Ansh good to see you again!! Yes it surely does!
Klariqo AI Voice Assistants
@hehe6z That's amazing! Would love to give it a try soon!
Fish Audio
@ansh_deb let me know if we can support with anything!