Launching today

Vox
Voice in, voice out — with GitHub Copilot
151 followers
Voice in, voice out — with GitHub Copilot
151 followers
Vox is a GitHub Copilot CLI extension: run /vox and a reactive listening orb opens in its own window. Speak your turn, hear the agent reply. Voice in, voice out — on Windows, macOS, and Linux.




Hey Product Hunt 👋 I'm the maker of Vox. I use GitHub Copilot constantly and got tired of being pinned to the keyboard, so I built a way to just talk to it. Run /vox and a reactive orb opens in its own window — you speak your turn, the session hears it, and the reply is read back. Voice in, voice out. You can barge in by voice to interrupt and correct it, there are live captions and a transcript, and it even reads your typed replies aloud. It works in the Copilot CLI and inside the Copilot app. It's pure JavaScript with no build step — it uses the browser's Web Speech APIs by launching Chromium in app mode instead of shipping Electron — so it installs in one line on Windows/macOS/Linux. Free and open source (MIT). I started it as an accessibility-minded experiment (a hands-free way to drive an agent), so I'd especially love feedback on the voice timing and the interrupt flow. Ask me anything!
Homepage: https://aasis21.github.io/vox/ · Code: https://github.com/aasis21/vox
Foyer
The voice input part is straightforward enough, but the interesting question is how well it handles the parts of coding where spoken intent gets ambiguous fast. Saying "refactor that function" out loud works fine when context is obvious, but what happens when Copilot needs clarification and the back-and-forth becomes a longer conversation? Curious whether Vox supports that kind of multi-turn dialogue or whether it's essentially one-shot voice-to-prompt with no correction loop. Also wondering how it handles things like variable names, file paths, or syntax that's painful to dictate accurately.
@fberrez1 Great question - it's full multi-turn, not one-shot. The orb stays open across the whole session: you can go back and forth as many times as you want, and if Copilot needs to ask a clarifying question, it just speaks that back and waits for your next turn like a normal conversation. For gnarly variable names/paths, I lean on the transcript panel + typed fallback - you can always type a turn instead of saying it, and typed replies still get read aloud, so it mixes voice and keyboard per-turn rather than forcing pure dictation.
Voice for coding agents gets compelling when interruption and correction are first-class, not an afterthought. The agent is going to misunderstand file names, symbols, and intent sometimes; the useful workflow is being able to stop it, restate the constraint, and keep the same session alive without touching the keyboard. Nice to see barge-in called out explicitly.
@krekeltronics Exactly the philosophy — barge-in isn't bolted on, it's wired into the core turn loop. Tapping the orb (or hitting Esc) while it's thinking or speaking calls a bargeCancel() that aborts the in-flight request and stops the TTS queue immediately, so you can cut in, restate the constraint, and keep going in the same session. No waiting out a wrong turn.
How does it handle accents or noisy environments in practice, and is the voice model running locally or hitting an external API that could add latency or cost per conversation?
@feyzagpyf It uses the browser's native Web Speech API (Chrome/Edge), so there's no separate model Vox ships or bills for — accent/noise handling is whatever your browser's built-in recognizer does, which in Chrome is generally solid but does call out to Google's speech service (not fully on-device), so it needs network. No extra latency/cost from Vox itself though — zero API keys, zero cloud calls of ours. Definitely room to improve here though — a local/offline recognition option (e.g. Whisper-based) is on my radar for a future version, especially for noisy environments and stronger accent coverage
@thys_beesman Pretty smooth — sentences are queued and spoken as they stream in (so it starts talking before the full reply arrives), and interrupting is a single tap/Esc that instantly kills both the audio and the in-flight response. Try it — the "barge-in" is honestly my favorite detail to demo.
Does the orb stay open in the background while I keep coding, or do I have to keep invoking /vox every time I want to switch from typing to talking?
@nisaxvhd It stays open in the background — you don't need to re-run /vox each time. Once it's open, just keep coding as normal; tap the orb or hit Space whenever you want to switch to talking, and it goes right back to listening for your session. /vox again only comes into play if you want to switch which session the orb is listening to (it auto-focuses to whichever one last called it) or if it's been closed via /vox-stop .
launching Chromium in app mode instead of shipping Electron is such a clean hack, one-line install with no build step because the browser already has the speech APIs. more tools should steal this
the barge-in interrupt is the detail that makes voice actually usable btw, nothing worse than waiting out a wrong answer
@yarslav Thank you! Yeah, launching Chrome/Edge in app mode was the unlock — get a real desktop-style window with zero Electron overhead and the Web Speech APIs just work natively. Glad the barge-in landed too, that was the detail I iterated on most.