Modulate

Convert your speech into anyone's voice, right as you speak!

more info

Modulate uses machine learning to create customizable voice skins, from only second-long clips. Sound like a celebrity; a favorite character; or even a different gender/race; all of these and more will soon be on the tip of your tongue.

Modulate is currently a prototype, but static demos are on the website, and an interactive demo is coming soon!

Reviews
Discussion
You need to become a Contributor to join the discussion.
Michael Pappas
Michael PappasMaker@mpappas86 · CEO @ Modulate.ai
Hey everyone! I'm Modulate's CEO, Mike Pappas (our CTO, Carter Huffman, is around here somewhere too.) We're super excited to be sharing our voice skin technology here - we think that by giving people creative control over their voices, it will create whole new worlds of possibilities for entertainment, marketing, apps, and much more! (Oh, and before you add "and fake news", let me mention that we are worried about that too! Please take a look at our Ethics page for some discussion of the controls we use to avoid misuse, such as watermarking. We're also more than happy to answer questions here - while we're confident Modulate will be a major net benefit for us all, we absolutely want to make sure we make it available in a responsible way!) Any questions/concerns/ideas? Please comment below!
Zak Mandhro
Zak Mandhro@zakmandhro · Founder
At first glance, I thought it was just doing Automatic Speech Recognition -> Text-To-Speech but I'm glad to learn it's actually processing the audio and modulating the artificial voice. Nice work guys!
Michael Pappas
Michael PappasMaker@mpappas86 · CEO @ Modulate.ai
@zakmandhro Thanks! - and great to know that we could be clearer about what makes this so exciting! As you say, ASR->TTS could do something similar, but by modifying the audio directly we're able to make sure the voice skin keeps all the original nuance and emotion of the input speech. This means our outputs sound just as rich as real speech - not to mention that bypassing text allows us to run this process all in real-time as you speak, instead of just as a post-processing filter!