Sesame

Sesame

Conversational Speech Model Achieving Voice Presence

153 followers

At Sesame, our goal is to achieve “voice presence”—the magical quality that makes spoken interactions feel real, understood, and valued.
Sesame gallery image
Sesame gallery image
Launch Team
Intercom
Intercom
Startups get 90% off Intercom + 1 year of Fin AI Agent free
Promoted

What do you think? …

Zac Zuo

Hi everyone!

Sharing Sesame's Conversational Speech Model (CSM), and this is a big step beyond typical text-to-speech. The goal is to achieve what Sesame calls "voice presence": making spoken interactions feel real, understood, and valued.

A PH version of this model System Card is :)

😃 Emotional Context: It tries to understand and respond to the emotion in the conversation.
⏱️ Conversational Dynamics: It aims for natural timing, pauses, and intonation.
🧠 Contextual Awareness: It adapts its tone and style to the situation.
👤 Consistent Personality: It maintains coherence.
👂 Multimodal: It understands both text and audio input.
🗣️ End-to-End: It generates speech directly, in a single stage, for greater efficiency.
🔓 Open Source: Models will be released under Apache 2.0 License.

They've built a custom evaluation suite to measure these conversational aspects, because traditional metrics (like Word Error Rate) don't really capture how natural the speech sounds.

The model itself is based on the Llama architecture, but with a clever split-transformer design.

You can try a demo to experience the conversational voice (It's magical, believe me)

Hunting credits to @sentry_co 🙌

Traun Leyden

@sentry_co @zaczuo That's amazing this will be released as Apache 2!

Swagam Dasgupta

A friend sent this to me and I just used the demo by putting Sesame in conversation with ChatGPT's voice agent and the difference is huge. It's definitely much more human like, especially with it's intonations and (micro?) expressions. The only hitch—I found it to be extra sensitive to external noises, making it pause in the middle of it's speech. Barring that, it's the most emotionally mature voice model in the market imo.

Ronit Soin

Feels almost sentient. I talked to Miles and told it to tweak its sarcasm down to 10%. Let's say, he knows how to play along :)

Rodrigo Muñoz

Amazing, very good job, I could be talking for hours 👏

Anshu Pawan

I wasn't able to believe I was talking to a AI voice!

Incredible. That was genuinely the most realistic, engaging conversation I have had with AI so far! Pretty damn breath taking.

Jacob Hokanson

This is a pretty stunning release. There were moments where I had to remember that I was speaking with AI, everything seemed so perfectly nuanced. Then there were moments of absolute derailment with intonation where I needed no reminding. Overall, super exciting evolution here. I can't wait to see what's next from Sesame.

12
Next
Last