Sourav Sanyal

We asked what felt off about AI voices, you told us. We’re fixing it.

by

Over the past few months, we’ve been talking to a lot of you using Velo.

Real conversations, and people trying it out, sending clips, pointing things out.

And almost everyone said some version of the same thing: “It sounds like me but something feels missing.”

At first, we thought it was about accuracy. Maybe the voice wasn’t close enough. But the more we listened, the clearer it became - that wasn’t the issue.

The issue was how it felt. The tone stays a bit too samey. The emphasis doesn’t always land where you expect it to. And the little natural shifts that make your voice yours just aren’t fully there yet. It sounds right, but it doesn’t feel alive.

So we went back and started reworking how we think about voice cloning at Velo. Not just matching how you sound, but capturing how you express. The way your voice changes when you’re explaining something, when you’re just talking casually, or when you actually care about what you’re saying.

That’s what we’re building now. The next version of Velo is focused on higher fidelity voice cloning. More nuance. Better pacing. More natural expression.

Something that doesn’t feel like a generated voice reading your script, but closer to you actually speaking.

We’re still building it, but it’s coming together fast. We’re planning to ship this soon.

If you’ve used Velo before, we’d love to know - what do you think about Velo's voice cloning or other workflows? What would make it feel right?

We’re listening.

300 views

Add a comment

Replies

Best
Isaac Dominic

What signals define "feels alive" for you?

Sourav Sanyal
@isaac_dominic1 once it feels authentic is an internal benchmark, like exactly like how you sound, but we make your first take the best
Dontell Levesque

@isaac_dominic1 I feel most alive when I’m learning something new and my curiosity is fully engaged. It’s like my brain lights up and I want to keep going deeper.

Cerca Hedgecock

@isaac_dominic1 I’ve noticed I feel alive when I’m completely absorbed in a task and lose track of time. It’s like my mind stops wandering and I’m just there, doing. No pressure, no distraction, just flow.

Martha S Bako

@isaac_dominic1 For me, feeling alive often comes in quiet moments, not loud ones. Like when I’m walking alone and suddenly realize how peaceful everything feels. It’s subtle but it makes me feel deeply connected to myself.

Morgan Nabors

@isaac_dominic1 I feel alive when I’m creating something, even if it’s imperfect. Writing, planning or building ideas gives me a sense of momentum. It feels like I’m expressing something that already exists inside me.

Daisy Morgan

Is this more about data or model architecture?

Sourav Sanyal
@daisy_morgan2 primarily model architecture
Dontell Levesque

I think adding more variation in pacing and slight imperfections could make it feel more human and less like a polished recording.

Sourav Sanyal
Tina Kim

What kind of content exposes voice limitations the most? @sourav_sanyal

Sourav Sanyal
@tina_kim2 All numbers, LLMs are really bad at handling numbers or numerical ops
Yara Simone

How do you avoid over-smoothing the voice?

Sourav Sanyal
@yara_simone we try and design the voice and add a lot of checkpoints while the voice is being generated
Rakesh Gupta

This really captures the gap I feel with most t AI voices. The sound is close, but the emotion and timing always feel slightly off.

Alex J Jemmy

Overall, focusing on how it feels instead of how it sounds feels like the right move 👍 if you get that right, it could change how people actually use voice tools

Judit

Totally agree. The problem isn’t accuracy anymore, it’s expression.
The “it sounds like me but doesn’t feel like me” is exactly where most tools break.

Elena K

This feels like exactly the right insight.

“Sounds like me” and “feels like me” are completely different product thresholds. A lot of AI products can get surprisingly far on resemblance, but people notice very quickly when expression, emotional timing, and natural variation are missing.

That’s where the uncanny feeling usually lives - not in the obvious errors, but in the absence of subtle life.

We think about something similar at SpeakUp
In any workflow connected to people, communication, and trust, the missing layer is often not functionality - it’s nuance.

Really strong direction. If you can make voice cloning feel less like playback and more like presence, that’s a meaningful leap.

Sai Tharun Kakirala

The uncanny valley of AI voices is so real. The specific thing that gets me is the rhythm - most AI voices nail individual word pronunciation but miss the natural flow of how humans accelerate or slow down across a sentence based on meaning and emphasis. It ends up feeling like someone reading words rather than saying them. Building Hello Aria (text-based AI assistant via WhatsApp and iOS), we deliberately stayed text-first partly for this reason - the text medium has more tolerance for AI-style communication than voice does. But the teams cracking the voice problem are doing something genuinely hard. Really looking forward to hearing how the fixes you're shipping actually change the listening experience.