We asked what felt off about AI voices, you told us. We’re fixing it.
Over the past few months, we’ve been talking to a lot of you using Velo.
Real conversations, and people trying it out, sending clips, pointing things out.
And almost everyone said some version of the same thing: “It sounds like me but something feels missing.”
At first, we thought it was about accuracy. Maybe the voice wasn’t close enough. But the more we listened, the clearer it became - that wasn’t the issue.
The issue was how it felt. The tone stays a bit too samey. The emphasis doesn’t always land where you expect it to. And the little natural shifts that make your voice yours just aren’t fully there yet. It sounds right, but it doesn’t feel alive.
So we went back and started reworking how we think about voice cloning at Velo. Not just matching how you sound, but capturing how you express. The way your voice changes when you’re explaining something, when you’re just talking casually, or when you actually care about what you’re saying.
That’s what we’re building now. The next version of Velo is focused on higher fidelity voice cloning. More nuance. Better pacing. More natural expression.
Something that doesn’t feel like a generated voice reading your script, but closer to you actually speaking.
We’re still building it, but it’s coming together fast. We’re planning to ship this soon.
If you’ve used Velo before, we’d love to know - what do you think about Velo's voice cloning or other workflows? What would make it feel right?
We’re listening.


Replies
What signals define "feels alive" for you?
Velo
@isaac_dominic1 I feel most alive when I’m learning something new and my curiosity is fully engaged. It’s like my brain lights up and I want to keep going deeper.
@isaac_dominic1 I’ve noticed I feel alive when I’m completely absorbed in a task and lose track of time. It’s like my mind stops wandering and I’m just there, doing. No pressure, no distraction, just flow.
@isaac_dominic1 For me, feeling alive often comes in quiet moments, not loud ones. Like when I’m walking alone and suddenly realize how peaceful everything feels. It’s subtle but it makes me feel deeply connected to myself.
@isaac_dominic1 I feel alive when I’m creating something, even if it’s imperfect. Writing, planning or building ideas gives me a sense of momentum. It feels like I’m expressing something that already exists inside me.
Is this more about data or model architecture?
Velo
I think adding more variation in pacing and slight imperfections could make it feel more human and less like a polished recording.
Velo
What kind of content exposes voice limitations the most? @sourav_sanyal
Velo
How do you avoid over-smoothing the voice?
Velo
This really captures the gap I feel with most t AI voices. The sound is close, but the emotion and timing always feel slightly off.
Overall, focusing on how it feels instead of how it sounds feels like the right move 👍 if you get that right, it could change how people actually use voice tools
Totally agree. The problem isn’t accuracy anymore, it’s expression.
The “it sounds like me but doesn’t feel like me” is exactly where most tools break.
This feels like exactly the right insight.
“Sounds like me” and “feels like me” are completely different product thresholds. A lot of AI products can get surprisingly far on resemblance, but people notice very quickly when expression, emotional timing, and natural variation are missing.
That’s where the uncanny feeling usually lives - not in the obvious errors, but in the absence of subtle life.
We think about something similar at SpeakUp
In any workflow connected to people, communication, and trust, the missing layer is often not functionality - it’s nuance.
Really strong direction. If you can make voice cloning feel less like playback and more like presence, that’s a meaningful leap.
Hello Aria
The uncanny valley of AI voices is so real. The specific thing that gets me is the rhythm - most AI voices nail individual word pronunciation but miss the natural flow of how humans accelerate or slow down across a sentence based on meaning and emphasis. It ends up feeling like someone reading words rather than saying them. Building Hello Aria (text-based AI assistant via WhatsApp and iOS), we deliberately stayed text-first partly for this reason - the text medium has more tolerance for AI-style communication than voice does. But the teams cracking the voice problem are doing something genuinely hard. Really looking forward to hearing how the fixes you're shipping actually change the listening experience.