DramaBox by Resemble AI - AI turns scene descriptions into vocal performances
by•
A TTS model should give you two things: an oscar-worthy performance and a verifiable signature to prove it's yours. DramaBox is the first to do both. Describe a scene the way you would to an actor, like 'a talk show host gasps in mock shock, bursts into laughter,' and the model interprets it as performance. Every output is watermarked with Resemble Watermarker. Open source, English-only for now, find it in your Resemble account or on Hugging Face.


Replies
Resemble AI
mailX by mailwarm
When using DramaBox, how consistent is the performance if you generate the same prompt twice, do you get similar delivery or totally different takes?
Resemble AI
@thamibenjelloun you get different takes unless you fix the seed.
The shift from text to vocal performance is exciting for storytellers. Is there support for multiple characters with distinct accents within a single scene description?
Resemble AI
@rivra_dev we haven't thoroughly tested multiple characters in a scene, but theoretically yes it should be supported. Would be curious to see your findings.
directing TTS like an actor with scene context rather than emotion tags is the right abstraction, tags always felt like a workaround for not having enough context. curious how it handles ambiguity in the scene description. if two people would direct the same line differently, does the model pick one interpretation or is there variance across runs?