Praney Behl

Praney Behl

VoisVois
Building Vois — desktop voice AI studio

Forums

v1.3.0: script import/export, progress bars, and an embarrassing cloning bug

Shipped another update today. Two things people asked for, one thing I should have caught earlier, and one quiet fix.

The embarrassing one first:

Voice cloning could silently fail. If you hadn't downloaded the Expressive engine model yet, the cloning process would run, appear to finish successfully, even show engine badges on the card. But the cloned voice wouldn't actually work. It looked fine. It wasn't.

The real cost breakdown of running a faceless YouTube channel

Nobody talks honestly about what faceless YouTube channels actually cost to run. So here's a real breakdown.

Monthly costs for a 2-video/week channel:

Voiceover (the biggest variable):

What we shipped in v1.2.1 (Windows GPU + stability fixes)

This one's mostly a Windows release.

The main change: if you're on Windows and using the Expressive or Multilingual engine, generation now runs on your GPU rather than your CPU. It's faster. It kicks in automatically with no setup needed. If your GPU doesn't support it for some reason, the app falls back to CPU without any fuss. You'll see a small GPU label in the engine selector when it's active.

Two other fixes landed with it:

Some Windows users were hitting a crash on startup. Tracked it down and patched it.

Prototyping NPC dialogue on a zero budget

I keep watching indie game devs burn time and money on voice acting way too early in development. Here's what actually works when you're prototyping on a budget of zero.

Phase 1: Text-only playtesting

Start here. Seriously. Put your dialogue in text boxes and watch playtesters read it. You'll cut 30% of your lines before anyone speaks a word. Written dialogue that reads well often sounds terrible spoken aloud, and vice versa. Test the script before you voice it.

New in v1.0.11: Pause nodes for precise silence control in scripts

Quick update from the trenches.

One thing that kept coming up in early feedback: there was no way to control silence in generated audio. You'd write a dramatic script, generate it, and the timing between lines felt off. No breathing room. No pauses for effect.

So we built Pause nodes.

Week 1 post-launch: what broke, what surprised us, what we shipped

You'd think I'd be ready for launch week chaos. I was not.

Vois launched here on March 5. Here's the honest recap.

The numbers:

  • 99 upvotes, #13 for the day

  • 116 followers

  • 9 comments on the launch post

  • ~50 downloads in Week 1

  • First Product Hunt review received

Launching Vois on Thursday 5th March — a desktop voice AI studio

Hey PH community,

I'm launching Vois on Thursday it's a desktop voice AI studio I've been building as a solo maker for the past year.

Some of you may have seen my earlier threads here about voice production costs for game devs, podcast workflows, audiobook production, and accessibility. Those conversations directly shaped what I built.

Text-to-audio for accessibility — where are the gaps?

I'm partially dyslexic. Long text has always been difficult for me not impossible, just slow enough that by the time I reach the bottom of a page, the top has faded. Since high school, I've been converting articles, papers, and reports to audio so I could actually absorb them.

Over the years I've tried everything: screen readers (functional but robotic), browser extensions (limited), cloud TTS services (good quality but expensive for heavy use), and various read-aloud apps.

None of them were quite right. Most are designed for occasional use read this one article, listen to this one page. They're not built for someone who processes a significant chunk of their reading through audio every single day.

The gaps I've personally experienced:

Local-first AI vs cloud AI — which is winning for voice generation?

Most voice AI services ElevenLabs, PlayHT, Murf run in the cloud. You upload your text, they generate audio, you download it. Per-character pricing.

But there's a clear shift toward local-first AI happening across the board. Apple's MLX framework, Ollama for LLMs, Whisper.cpp for transcription. Models are getting small enough and hardware is getting fast enough that "run it on your own machine" is a real option.

For voice generation specifically, the tradeoffs are interesting:

Cloud advantages:

How are L&D teams handling voice for e-learning content?

Enterprise learning and development teams produce a staggering amount of audio content onboarding modules, compliance training, product walkthroughs, internal communications. And most of it needs to be updated quarterly or annually.

The traditional workflow is painful:

  • Script changes require re-recording (book the studio, schedule the narrator, wait for delivery)

  • Multi-language versions multiply the cost and timeline

  • Compliance updates on tight deadlines mean rushing voice talent

  • Brand voice consistency across hundreds of modules is nearly impossible with different narrators over time

Cloud TTS services solve some of this but introduce new problems for enterprise:

Has anyone self-produced an audiobook with AI voices?

The audiobook market is growing fast something like 25% year-over-year but production costs are still a major barrier for independent authors.

Professional narration typically runs $200-400 per finished hour. A 10-hour audiobook? That's $2,000-4,000 before editing and mastering. For self-published authors who might sell 100-500 copies, the math is brutal.

AI narration is the obvious alternative, and platforms like Google Play Books and some ACX distributors now accept AI-narrated audiobooks (with disclosure). But the workflow is surprisingly clunky:

  • Cloud TTS services charge per character. A full-length book (80,000 words) burns through a lot of credits especially when you need to regenerate chapters after editing

  • Most TTS tools aren't designed for long-form content. They handle single paragraphs well but struggle with maintaining consistent voice quality over hours of audio

  • Mastering to ACX standards (RMS levels, noise floor, peak levels) requires separate tools

  • Multi-voice books (dialogue between characters) need manual stitching in most tools

The faceless YouTube channel trend — what voice solution are creators actually using?

Faceless YouTube channels are everywhere now. Finance explainers, tech reviews, history deep dives, true crime, Reddit compilations millions of views, no face on camera.

The voice is the entire brand for these channels. And from what I can see, creators are split between a few approaches:

  1. Recording their own voice works but takes time, needs decent equipment, and not everyone likes their voice

  2. Hiring voiceover talent Fiverr ranges from $20-100 per video depending on length and quality. Gets expensive at 3-5 videos per week

  3. Cloud TTS ElevenLabs, PlayHT, etc. Quality has gotten impressive, but per-character pricing at high volume (daily or multi-weekly uploads) adds up

  4. Free TTS tools Still sounds robotic enough to get comments about it

The interesting tension: YouTube's algorithm rewards consistency and volume. The more you upload, the more the algorithm favors you. But voice production is often the bottleneck whether it's recording time, talent costs, or TTS credits.

Are podcasters actually using AI voices? What's working?

I keep seeing "AI-powered podcast" tools pop up, and I'm curious what's actually working for people in practice.

The pitch is obvious: skip the recording, editing, scheduling just write a script and generate audio. But the reality seems more nuanced.

What I've been hearing:

  • Solo podcasters who hate the sound of their own voice are interested, but worried about authenticity. "Will my audience know?"

  • Show producers want AI for filler segments (intros, transitions, recaps) but keep human hosts for interviews and personality

  • Some people run multiple shows and physically can't record enough AI voices are a capacity multiplier, not a replacement

  • Non-English creators want to produce English-language versions of their shows without hiring voice talent

Praney Behl

11d ago

Vois - Studio-quality text-to-speech and voice cloning, fully local

Vois is a desktop voice studio for turning scripts, ebooks, articles, and podcasts into natural audio with 63 voices, voice cloning, and pro editing — no uploads, no per-character fees, no usage caps. Cloud voice tools charge per character, cap usage, and upload your scripts. Vois gives you studio-quality speech, voice cloning, and editing fully on your laptop or desktop.
Y Combinatorp/ycNika

1mo ago

Y Combinator offers 7 startups ideas they want to fund (Spring 2026)

As usual, Y Combinator came up with segments that are worth investing:

1. Cursor for Product Managers
2. AI-Native Hedge Funds
3. AI-Native Agencies
4. Stablecoin Financial Services
5. AI for Government
6. Modern Metal Mills
7. AI Guidance for Physical Work 8. Large Spatial Models 9. Infra for Government Fraud Hunters 10. Make LLMs Easy to Train

Praney Behl

8mo ago

CCProxy - Keep Claude Code's magic, slash costs by 90%

Love Claude Code but hate the $200/month cost? CCProxy lets you use ANY AI model with Claude Code. Keep the best coding interface, slash costs by 90%. Free & open source. Works with Gemini, OpenAI, Kimi K2, Qwen3 Coder, 100+ models. Simple-line setup.