Vox - Voice in, voice out — with GitHub Copilot

by•12h ago

Vox is a GitHub Copilot CLI extension: run /vox and a reactive listening orb opens in its own window. Speak your turn, hear the agent reply. Voice in, voice out — on Windows, macOS, and Linux.

Replies

Best

Hunter

📌

Hey Product Hunt 👋 I'm the maker of Vox. I use GitHub Copilot constantly and got tired of being pinned to the keyboard, so I built a way to just talk to it. Run /vox and a reactive orb opens in its own window — you speak your turn, the session hears it, and the reply is read back. Voice in, voice out. You can barge in by voice to interrupt and correct it, there are live captions and a transcript, and it even reads your typed replies aloud. It works in the Copilot CLI and inside the Copilot app. It's pure JavaScript with no build step — it uses the browser's Web Speech APIs by launching Chromium in app mode instead of shipping Electron — so it installs in one line on Windows/macOS/Linux. Free and open source (MIT). I started it as an accessibility-minded experiment (a hands-free way to drive an agent), so I'd especially love feedback on the voice timing and the interrupt flow. Ask me anything!

Homepage: https://aasis21.github.io/vox/ · Code: https://github.com/aasis21/vox

Report

19h ago

Foyer

The voice input part is straightforward enough, but the interesting question is how well it handles the parts of coding where spoken intent gets ambiguous fast. Saying "refactor that function" out loud works fine when context is obvious, but what happens when Copilot needs clarification and the back-and-forth becomes a longer conversation? Curious whether Vox supports that kind of multi-turn dialogue or whether it's essentially one-shot voice-to-prompt with no correction loop. Also wondering how it handles things like variable names, file paths, or syntax that's painful to dictate accurately.

Report

11h ago

Hunter

@fberrez1 Great question - it's full multi-turn, not one-shot. The orb stays open across the whole session: you can go back and forth as many times as you want, and if Copilot needs to ask a clarifying question, it just speaks that back and waits for your next turn like a normal conversation. For gnarly variable names/paths, I lean on the transcript panel + typed fallback - you can always type a turn instead of saying it, and typed replies still get read aloud, so it mixes voice and keyboard per-turn rather than forcing pure dictation.

Report

1h ago

launching Chromium in app mode instead of shipping Electron is such a clean hack, one-line install with no build step because the browser already has the speech APIs. more tools should steal this

the barge-in interrupt is the detail that makes voice actually usable btw, nothing worse than waiting out a wrong answer

Report

10h ago

Hunter

@yarslav Thank you! Yeah, launching Chrome/Edge in app mode was the unlock — get a real desktop-style window with zero Electron overhead and the Web Speech APIs just work natively. Glad the barge-in landed too, that was the detail I iterated on most.

Report

1h ago

How does it handle accents or noisy environments in practice, and is the voice model running locally or hitting an external API that could add latency or cost per conversation?

Report

10h ago

Hunter

@feyzagpyf It uses the browser's native Web Speech API (Chrome/Edge), so there's no separate model Vox ships or bills for — accent/noise handling is whatever your browser's built-in recognizer does, which in Chrome is generally solid but does call out to Google's speech service (not fully on-device), so it needs network. No extra latency/cost from Vox itself though — zero API keys, zero cloud calls of ours. Definitely room to improve here though — a local/offline recognition option (e.g. Whisper-based) is on my radar for a future version, especially for noisy environments and stronger accent coverage

Report

1h ago

I Love the idea of talking to Copilot, how smooth is the voice flow when you interrupt or correct mid conversation?

Report

10h ago

Hunter

@thys_beesman Pretty smooth — sentences are queued and spoken as they stream in (so it starts talking before the full reply arrives), and interrupting is a single tap/Esc that instantly kills both the audio and the in-flight response. Try it — the "barge-in" is honestly my favorite detail to demo.

Report

1h ago

Does the orb stay open in the background while I keep coding, or do I have to keep invoking /vox every time I want to switch from typing to talking?

Report

8h ago

Hunter

@nisaxvhd It stays open in the background — you don't need to re-run /vox each time. Once it's open, just keep coding as normal; tap the orb or hit Space whenever you want to switch to talking, and it goes right back to listening for your session. /vox again only comes into play if you want to switch which session the orb is listening to (it auto-focuses to whichever one last called it) or if it's been closed via /vox-stop .

Report

1h ago

Me appreciate the simple setup process. Why not include offline support? I think limited offline features would increase reliability.

Report

8h ago

Hunter

@alex_bravo1 Appreciate that! Offline is on my radar — right now it leans on the browser's native Web Speech API for simplicity/zero-install, but that does need network for recognition. A local/offline mode (likely Whisper-based) would genuinely help reliability in spotty-network or privacy-sensitive setups, so it's a good candidate for a future version.

Report

1h ago

That's clever. Any plans to support other AI coding assistants beyond GitHub Copilot?

Report

7h ago

Hunter

@dhiraj_patel5 Right now it's built tightly on the Copilot CLI's extension/SDK hooks (that's how it taps into turns, streaming replies, and session state) — so it's Copilot-specific today, not agent-agnostic. That said, the voice layer itself (mic capture, barge-in, TTS queue) is a self-contained browser front-end, so porting the "wiring" to another agent's extension API is architecturally possible if there's interest — just not on the roadmap yet.

Report

1h ago

Voice for coding agents gets compelling when interruption and correction are first-class, not an afterthought. The agent is going to misunderstand file names, symbols, and intent sometimes; the useful workflow is being able to stop it, restate the constraint, and keep the same session alive without touching the keyboard. Nice to see barge-in called out explicitly.

Report

7h ago

Hunter

@krekeltronics Exactly the philosophy — barge-in isn't bolted on, it's wired into the core turn loop. Tapping the orb (or hitting Esc) while it's thinking or speaking calls a bargeCancel() that aborts the in-flight request and stops the TTS queue immediately, so you can cut in, restate the constraint, and keep going in the same session. No waiting out a wrong turn.

Report

1h ago

The reactive listening orb in its own dedicated window is a really nice touch, keeps the voice interaction feeling like a proper companion rather than just another terminal pane.

Report

7h ago

Hunter

@alperen397545 Thanks! That was very deliberate — I wanted it to feel like a companion you glance at and talk to, not just another pane competing for attention in your terminal. Launching it as its own chrome-less app-mode window (rather than a browser tab or Electron app) is what makes that possible while still keeping the Web Speech APIs working natively.

Report

1h ago