Zac Zuo

Higgs Audio v2 - Lifelike, emotionally competent voice generation

Higgs Audio v2 by BosonAI is a powerful open-source audio foundation model. It excels at generating expressive, multi-speaker dialogues and long-form audio. It outperforms GPT-4o-mini-tts on emotion benchmarks and is now available for developers.

Add a comment

Replies

Best
Zac Zuo

Hi everyone!

BosonAI has open-sourced Higgs Audio v2, a new and powerful audio foundation model.

It’s built for creating lifelike, multi-speaker conversations and long-form audio. The model also has a strong focus on emotional expression, achieving a 75.7% win rate over gpt-4o-mini-tts on the Emotions category of the EmergentTTS-Eval benchmark.

Being open source, it's very accessible. The smaller models can even run on a Jetson Orin Nano!