Nari Labs

Nari Labs

Open-Source AI Voice with Emotion & Realism

5 followers

Nari Dia is a text-to-speech model that generates ultra-realistic dialogue, complete with emotional tone and nonverbal cues like laughter and sighs. Zero-shot voice cloning and real-time performance make it a game-changer for creators. 🔗 narilabs.org
Nari Labs gallery image
Nari Labs gallery image
Nari Labs gallery image
Nari Labs gallery image
Free
Launch Team
AssemblyAI
AssemblyAI
Build voice AI apps with a single API
Promoted

What do you think? …

Yashank Goswami
Hunter
📌
Hey Product Hunt! I recently discovered Nari Dia while exploring open-source text-to-speech solutions, and it’s truly impressive. Developed by a small team at Nari Labs, Dia is a 1.6B parameter model that brings text to life with remarkable realism. 🎯 Key Features: • Emotional Nuance: Captures emotions like joy, sadness, and surprise in speech. • Nonverbal Cues: Includes natural sounds like laughter, sighs, and coughs. • Zero-Shot Voice Cloning: Mimics a speaker’s voice from just a few seconds of audio. • Real-Time Performance: Operates efficiently on a single GPU (~10GB VRAM). • Open-Source: Available under Apache 2.0 license with resources on GitHub and Hugging Face. Whether you’re developing virtual assistants, creating audiobooks, or enhancing gaming experiences, Dia offers a powerful and accessible solution. Questions for the Makers: • What inspired the development of Dia? • Are there plans to support additional languages beyond English? • How can the community contribute to Dia’s growth and development? I'm excited to see how others will leverage Dia in their projects! Let’s discuss below: 🔗 narilabs.org
Joy Wang

Nari Dia is a standout text-to-speech model that truly pushes the boundaries of voice generation. The emotional nuance—like laughter, sighs, and subtle shifts in tone—adds a level of realism I haven’t seen in many other tools. With zero-shot voice cloning and real-time delivery, it feels like a game-changer for anyone working in storytelling, games, or voice-driven content.