Forums
v1.3.0: script import/export, progress bars, and an embarrassing cloning bug
Shipped another update today. Two things people asked for, one thing I should have caught earlier, and one quiet fix.
The embarrassing one first:
Voice cloning could silently fail. If you hadn't downloaded the Expressive engine model yet, the cloning process would run, appear to finish successfully, even show engine badges on the card. But the cloned voice wouldn't actually work. It looked fine. It wasn't.
The real cost breakdown of running a faceless YouTube channel
Nobody talks honestly about what faceless YouTube channels actually cost to run. So here's a real breakdown.
Monthly costs for a 2-video/week channel:
Voiceover (the biggest variable):
What we shipped in v1.2.1 (Windows GPU + stability fixes)
This one's mostly a Windows release.
The main change: if you're on Windows and using the Expressive or Multilingual engine, generation now runs on your GPU rather than your CPU. It's faster. It kicks in automatically with no setup needed. If your GPU doesn't support it for some reason, the app falls back to CPU without any fuss. You'll see a small GPU label in the engine selector when it's active.
Two other fixes landed with it:
Some Windows users were hitting a crash on startup. Tracked it down and patched it.
Prototyping NPC dialogue on a zero budget
I keep watching indie game devs burn time and money on voice acting way too early in development. Here's what actually works when you're prototyping on a budget of zero.
Phase 1: Text-only playtesting
Start here. Seriously. Put your dialogue in text boxes and watch playtesters read it. You'll cut 30% of your lines before anyone speaks a word. Written dialogue that reads well often sounds terrible spoken aloud, and vice versa. Test the script before you voice it.
New in v1.0.11: Pause nodes for precise silence control in scripts
Quick update from the trenches.
One thing that kept coming up in early feedback: there was no way to control silence in generated audio. You'd write a dramatic script, generate it, and the timing between lines felt off. No breathing room. No pauses for effect.
So we built Pause nodes.
Week 1 post-launch: what broke, what surprised us, what we shipped
You'd think I'd be ready for launch week chaos. I was not.
Vois launched here on March 5. Here's the honest recap.
The numbers:
99 upvotes, #13 for the day
116 followers
9 comments on the launch post
~50 downloads in Week 1
First Product Hunt review received
Launching Vois on Thursday 5th March — a desktop voice AI studio
Hey PH community,
I'm launching Vois on Thursday it's a desktop voice AI studio I've been building as a solo maker for the past year.
Some of you may have seen my earlier threads here about voice production costs for game devs, podcast workflows, audiobook production, and accessibility. Those conversations directly shaped what I built.
Text-to-audio for accessibility — where are the gaps?
I'm partially dyslexic. Long text has always been difficult for me not impossible, just slow enough that by the time I reach the bottom of a page, the top has faded. Since high school, I've been converting articles, papers, and reports to audio so I could actually absorb them.
Over the years I've tried everything: screen readers (functional but robotic), browser extensions (limited), cloud TTS services (good quality but expensive for heavy use), and various read-aloud apps.
None of them were quite right. Most are designed for occasional use read this one article, listen to this one page. They're not built for someone who processes a significant chunk of their reading through audio every single day.
The gaps I've personally experienced:
Local-first AI vs cloud AI — which is winning for voice generation?
Most voice AI services ElevenLabs, PlayHT, Murf run in the cloud. You upload your text, they generate audio, you download it. Per-character pricing.
But there's a clear shift toward local-first AI happening across the board. Apple's MLX framework, Ollama for LLMs, Whisper.cpp for transcription. Models are getting small enough and hardware is getting fast enough that "run it on your own machine" is a real option.
For voice generation specifically, the tradeoffs are interesting:
Cloud advantages:
How are L&D teams handling voice for e-learning content?
Enterprise learning and development teams produce a staggering amount of audio content onboarding modules, compliance training, product walkthroughs, internal communications. And most of it needs to be updated quarterly or annually.
The traditional workflow is painful:
Script changes require re-recording (book the studio, schedule the narrator, wait for delivery)
Multi-language versions multiply the cost and timeline
Compliance updates on tight deadlines mean rushing voice talent
Brand voice consistency across hundreds of modules is nearly impossible with different narrators over time
Cloud TTS services solve some of this but introduce new problems for enterprise:
Has anyone self-produced an audiobook with AI voices?
The audiobook market is growing fast something like 25% year-over-year but production costs are still a major barrier for independent authors.
Professional narration typically runs $200-400 per finished hour. A 10-hour audiobook? That's $2,000-4,000 before editing and mastering. For self-published authors who might sell 100-500 copies, the math is brutal.
AI narration is the obvious alternative, and platforms like Google Play Books and some ACX distributors now accept AI-narrated audiobooks (with disclosure). But the workflow is surprisingly clunky:
Cloud TTS services charge per character. A full-length book (80,000 words) burns through a lot of credits especially when you need to regenerate chapters after editing
Most TTS tools aren't designed for long-form content. They handle single paragraphs well but struggle with maintaining consistent voice quality over hours of audio
Mastering to ACX standards (RMS levels, noise floor, peak levels) requires separate tools
Multi-voice books (dialogue between characters) need manual stitching in most tools
The faceless YouTube channel trend — what voice solution are creators actually using?
Faceless YouTube channels are everywhere now. Finance explainers, tech reviews, history deep dives, true crime, Reddit compilations millions of views, no face on camera.
The voice is the entire brand for these channels. And from what I can see, creators are split between a few approaches:
Recording their own voice works but takes time, needs decent equipment, and not everyone likes their voice
Hiring voiceover talent Fiverr ranges from $20-100 per video depending on length and quality. Gets expensive at 3-5 videos per week
Cloud TTS ElevenLabs, PlayHT, etc. Quality has gotten impressive, but per-character pricing at high volume (daily or multi-weekly uploads) adds up
Free TTS tools Still sounds robotic enough to get comments about it
The interesting tension: YouTube's algorithm rewards consistency and volume. The more you upload, the more the algorithm favors you. But voice production is often the bottleneck whether it's recording time, talent costs, or TTS credits.
Are podcasters actually using AI voices? What's working?
I keep seeing "AI-powered podcast" tools pop up, and I'm curious what's actually working for people in practice.
The pitch is obvious: skip the recording, editing, scheduling just write a script and generate audio. But the reality seems more nuanced.
What I've been hearing:
Solo podcasters who hate the sound of their own voice are interested, but worried about authenticity. "Will my audience know?"
Show producers want AI for filler segments (intros, transitions, recaps) but keep human hosts for interviews and personality
Some people run multiple shows and physically can't record enough AI voices are a capacity multiplier, not a replacement
Non-English creators want to produce English-language versions of their shows without hiring voice talent
Vois - Studio-quality text-to-speech and voice cloning, fully local
Y Combinator offers 7 startups ideas they want to fund (Spring 2026)
As usual, Y Combinator came up with segments that are worth investing:
1. Cursor for Product Managers
2. AI-Native Hedge Funds
3. AI-Native Agencies
4. Stablecoin Financial Services
5. AI for Government
6. Modern Metal Mills
7. AI Guidance for Physical Work 8. Large Spatial Models 9. Infra for Government Fraud Hunters 10. Make LLMs Easy to Train


