DramaBox by Resemble AI - AI turns scene descriptions into vocal performances

Resemble AI

•2mo ago

A TTS model should give you two things: an oscar-worthy performance and a verifiable signature to prove it's yours. DramaBox is the first to do both. Describe a scene the way you would to an actor, like 'a talk show host gasps in mock shock, bursts into laughter,' and the model interprets it as performance. Every output is watermarked with Resemble Watermarker. Open source, English-only for now, find it in your Resemble account or on Hugging Face.

Replies

Best

Resemble AI

Maker

📌

Hey Product Hunt 👋 Tedi here, CTO at Resemble AI. Today we're launching DramaBox, a watermarked and open-source TTS model you direct the way you'd direct an actor. What you put in: - A natural language scene description - Dialogue in quotes (e.g., "HAHAHA! I can't believe it!") - Optional: a reference voice that the model will clone from, or let the model pick one that fits the scene - Example: 'A talk show host gasps with shock, "No! You did NOT just say that!" He bursts into uncontrollable laughter, "Hahaha! Oh my god, oh my god!" He wheezes, "I cannot, I literally cannot breathe right now!"' What you get out: - An expressive vocal performance with laughter, gasps, pacing, and emphasis driven by the scene description - 48kHz stereo, broadcast-quality output - A single audio file ready to ship - PerTh watermarking embedded at generation for provenance All without style tags, emotion tags or post-processing required! DramaBox is open source, and available on GitHub, Hugging Face or in your Resemble account. A note on why we included watermarking by default: gen AI is being used in increasingly malicious ways, so voice AI with embedded security gives everyone who builds with DramaBox a verification advantage. If you're building an agent, feel free to check out our detection API as well. Curious to see and hear what you build, please share feedback or any interesting edge cases with us here. Tedi

Report

2mo ago

Mailwarm

When using DramaBox, how consistent is the performance if you generate the same prompt twice, do you get similar delivery or totally different takes?

Report

2mo ago

Resemble AI

Maker

@thamibenjelloun you get different takes unless you fix the seed.

Report

2mo ago

The shift from text to vocal performance is exciting for storytellers. Is there support for multiple characters with distinct accents within a single scene description?

Report

2mo ago

Resemble AI

Maker

@rivra_dev we haven't thoroughly tested multiple characters in a scene, but theoretically yes it should be supported. Would be curious to see your findings.

Report

2mo ago

directing TTS like an actor with scene context rather than emotion tags is the right abstraction, tags always felt like a workaround for not having enough context. curious how it handles ambiguity in the scene description. if two people would direct the same line differently, does the model pick one interpretation or is there variance across runs?

Report

2mo ago