TADA - 1:1 text-acoustic alignment for 5x faster speech generation

TADA (Text-Acoustic Dual Alignment) is Hume AI's open-source speech-language model that synchronizes text and audio one-to-one. TADA synchronizes text and speech into a single continuous stream via 1:1 token alignment. Generating audio at 5x the speed of conventional LLM-based TTS systems completely eliminates skipped words and content hallucinations across 1000+ tests.

Add a comment

Replies

Best

Hi everyone!

TADA is one of the most interesting open-source voice releases I’ve seen in a while.

The big idea is simple but brilliant: it aligns text and audio one-to-one, so the model never has to juggle that huge mismatch between text tokens and acoustic frames. That single change unlocks the three things people actually care about in TTS: way better speed, much longer context, and basically zero content hallucinations.


Hume reports 5x faster generation than similar LLM-based systems, zero hallucinations across 1,000+ test samples, and it can fit roughly 700 seconds of audio in a 2,048-token context where other models tap out way earlier.

Releasing the and models under an open-source license gives the community a massive new tool for building highly reliable voice agents — especially on the edge.

I'm gonna used it today for my raspberry pi at home. Claude said it was the best option availabke!

How does Hume measure and validate whether its AI systems are genuinely improving human emotional well-being rather than simply optimizing for engagement or perceived satisfaction?

will be waiting for the gguf

Congratulations on the launch guys, this definitely looks promising!

But does the 1:1 alignment still work well with expressive speech or emotional tones?