Zac Zuo

Chatterbox Turbo - Fast, expressive, open source TTS with native watermarking

Chatterbox Turbo is a 350M parameter open-source TTS model. It features paralinguistic tags (control laughs, sighs, etc.), zero-shot cloning, and runs 6x faster than real-time. Uniquely includes built-in PerTh watermarking for safety.

Add a comment

Replies

Best
Zac Zuo

Hi everyone!

This is a really generous release from the Resemble AI team. The "paralinguistic tags" feature is super interesting: being able to simply type [laugh] or [sigh] to control the emotion is a very practical touch for getting natural results.

I also really appreciate that it includes the PerTh watermarking by default. It is rare to see safety features baked directly into an MIT-licensed model like this.

Fast, expressive, and traceable. This model has huge potential in the open source TTS space.

A R
@zaczuo agree ! The quality is quite good and comparable to elevenlabs v3.0 model in alpha . Hope this is made available in voice agents platforms like Livekit soon ?
Zohaib Ahmed
Malek Moumtaz

@zaczuo Hey Congrats on the launch! Paralinguistic tags are a great abstraction. How do you think about standardizing or evolving that tag vocabulary over time so creators get expressive control without fragmenting compatibility across tools and models in the open-source TTS ecosystem?

Sujal Meghwal

@zaczuo Congrats on the Chatterbox Turbo launch shipping native watermarking by default is a strong signal in the current voice-AI risk landscape.

I run a security firm focused specifically on AI abuse & adversarial testing, rather than traditional web pentesting. This week we completed our first AI-focused penetration test, where we validated a real-world weakness within the application logic itself not infrastructure related to how AI safety assumptions were enforced under abuse scenarios.

That experience is what prompted me to reach out. We help voice and generative-media companies pressure-test areas like watermark evasion, consent bypass paths, API abuse, and output-based model extraction before those issues are discovered externally.

I’m not reaching out to sell tooling more to see if you’d be open to a short conversation on how adversaries actually try to bypass safeguards like watermarking and voice controls, and what testing has proven most useful so far. happy to share concrete examples if useful.

Germán Merlo

Wow man! I'll give it a try. All the best here

Samuel Rondot

I will definitely implement it on storyshort !

Alex Cloudstar

Neat. I do light VO/podcast stuff and the speech-to-speech + quick edits are what I care about. Zero-shot clone for pickups sounds handy. Big plus on watermarking + detection—feels safer. Curious how natural the laughs/sighs controls come out.

Abdul Rehman

This is amazing, man! Audio editing is usually a pain. If this actually simplifies it, that’s a big win.

Tarfa Ali

I gave it a try; what you did is really nice.

Avhijit Nair

Wow, this is definitely a game changer!

Mykyta Semenov 🇺🇦🇳🇱

Very interesting! Which languages are supported? If I provide a sample in one language, can I copy my voice and have the service read something in another language using my voice?