Vox is a GitHub Copilot CLI extension: run /vox and a reactive listening orb opens in its own window. Speak your turn, hear the agent reply. Voice in, voice out โ on Windows, macOS, and Linux.
Replies
Best
Hunter
๐
Hey Product Hunt ๐ I'm the maker of Vox. I use GitHub Copilot constantly and got tired of being pinned to the keyboard, so I built a way to just talk to it. Run /vox and a reactive orb opens in its own window โ you speak your turn, the session hears it, and the reply is read back. Voice in, voice out. You can barge in by voice to interrupt and correct it, there are live captions and a transcript, and it even reads your typed replies aloud. It works in the Copilot CLI and inside the Copilot app. It's pure JavaScript with no build step โ it uses the browser's Web Speech APIs by launching Chromium in app mode instead of shipping Electron โ so it installs in one line on Windows/macOS/Linux. Free and open source (MIT). I started it as an accessibility-minded experiment (a hands-free way to drive an agent), so I'd especially love feedback on the voice timing and the interrupt flow. Ask me anything!
The voice input part is straightforward enough, but the interesting question is how well it handles the parts of coding where spoken intent gets ambiguous fast. Saying "refactor that function" out loud works fine when context is obvious, but what happens when Copilot needs clarification and the back-and-forth becomes a longer conversation? Curious whether Vox supports that kind of multi-turn dialogue or whether it's essentially one-shot voice-to-prompt with no correction loop. Also wondering how it handles things like variable names, file paths, or syntax that's painful to dictate accurately.
Report
Hunter
@fberrez1ย Great question - it's full multi-turn, not one-shot. The orb stays open across the whole session: you can go back and forth as many times as you want, and if Copilot needs to ask a clarifying question, it just speaks that back and waits for your next turn like a normal conversation. For gnarly variable names/paths, I lean on the transcript panel + typed fallback - you can always type a turn instead of saying it, and typed replies still get read aloud, so it mixes voice and keyboard per-turn rather than forcing pure dictation.
Report
launching Chromium in app mode instead of shipping Electron is such a clean hack, one-line install with no build step because the browser already has the speech APIs. more tools should steal this
the barge-in interrupt is the detail that makes voice actually usable btw, nothing worse than waiting out a wrong answer
Report
Hunter
@yarslavย Thank you! Yeah, launching Chrome/Edge in app mode was the unlock โ get a real desktop-style window with zero Electron overhead and the Web Speech APIs just work natively. Glad the barge-in landed too, that was the detail I iterated on most.
Report
How does it handle accents or noisy environments in practice, and is the voice model running locally or hitting an external API that could add latency or cost per conversation?
Report
Hunter
@feyzagpyfย It uses the browser's native Web Speech API (Chrome/Edge), so there's no separate model Vox ships or bills for โ accent/noise handling is whatever your browser's built-in recognizer does, which in Chrome is generally solid but does call out to Google's speech service (not fully on-device), so it needs network. No extra latency/cost from Vox itself though โ zero API keys, zero cloud calls of ours. Definitely room to improve here though โ a local/offline recognition option (e.g. Whisper-based) is on my radar for a future version, especially for noisy environments and stronger accent coverage
Report
I Love the idea of talking to Copilot, how smooth is the voice flow when you interrupt or correct mid conversation?
Report
Hunter
@thys_beesmanย Pretty smooth โ sentences are queued and spoken as they stream in (so it starts talking before the full reply arrives), and interrupting is a single tap/Esc that instantly kills both the audio and the in-flight response. Try it โ the "barge-in" is honestly my favorite detail to demo.
Report
Does the orb stay open in the background while I keep coding, or do I have to keep invoking /vox every time I want to switch from typing to talking?
Report
Hunter
@nisaxvhdย It stays open in the background โ you don't need to re-run ย /voxย each time. Once it's open, just keep coding as normal; tap the orb or hit Space whenever you want to switch to talking, and it goes right back to listening for your session. ย /voxย again only comes into play if you want to switch which session the orb is listening to (it auto-focuses to whichever one last called it) or if it's been closed via ย /vox-stopย .
Report
Me appreciate the simple setup process. Why not include offline support? I think limited offline features would increase reliability.
Report
Hunter
@alex_bravo1ย Appreciate that! Offline is on my radar โ right now it leans on the browser's native Web Speech API for simplicity/zero-install, but that does need network for recognition. A local/offline mode (likely Whisper-based) would genuinely help reliability in spotty-network or privacy-sensitive setups, so it's a good candidate for a future version.
Report
That's clever. Any plans to support other AI coding assistants beyond GitHub Copilot?
Report
Hunter
@dhiraj_patel5ย Right now it's built tightly on the Copilot CLI's extension/SDK hooks (that's how it taps into turns, streaming replies, and session state) โ so it's Copilot-specific today, not agent-agnostic. That said, the voice layer itself (mic capture, barge-in, TTS queue) is a self-contained browser front-end, so porting the "wiring" to another agent's extension API is architecturally possible if there's interest โ just not on the roadmap yet.
Report
Voice for coding agents gets compelling when interruption and correction are first-class, not an afterthought. The agent is going to misunderstand file names, symbols, and intent sometimes; the useful workflow is being able to stop it, restate the constraint, and keep the same session alive without touching the keyboard. Nice to see barge-in called out explicitly.
Report
Hunter
@krekeltronicsย Exactly the philosophy โ barge-in isn't bolted on, it's wired into the core turn loop. Tapping the orb (or hitting Esc) while it's thinking or speaking calls a ย bargeCancel()ย that aborts the in-flight request and stops the TTS queue immediately, so you can cut in, restate the constraint, and keep going in the same session. No waiting out a wrong turn.
Report
The reactive listening orb in its own dedicated window is a really nice touch, keeps the voice interaction feeling like a proper companion rather than just another terminal pane.
Report
Hunter
@alperen397545ย Thanks! That was very deliberate โ I wanted it to feel like a companion you glance at and talk to, not just another pane competing for attention in your terminal. Launching it as its own chrome-less app-mode window (rather than a browser tab or Electron app) is what makes that possible while still keeping the Web Speech APIs working natively.
Replies
Hey Product Hunt ๐ I'm the maker of Vox. I use GitHub Copilot constantly and got tired of being pinned to the keyboard, so I built a way to just talk to it. Run /vox and a reactive orb opens in its own window โ you speak your turn, the session hears it, and the reply is read back. Voice in, voice out. You can barge in by voice to interrupt and correct it, there are live captions and a transcript, and it even reads your typed replies aloud. It works in the Copilot CLI and inside the Copilot app. It's pure JavaScript with no build step โ it uses the browser's Web Speech APIs by launching Chromium in app mode instead of shipping Electron โ so it installs in one line on Windows/macOS/Linux. Free and open source (MIT). I started it as an accessibility-minded experiment (a hands-free way to drive an agent), so I'd especially love feedback on the voice timing and the interrupt flow. Ask me anything!
Homepage: https://aasis21.github.io/vox/ ยท Code: https://github.com/aasis21/vox
Foyer
The voice input part is straightforward enough, but the interesting question is how well it handles the parts of coding where spoken intent gets ambiguous fast. Saying "refactor that function" out loud works fine when context is obvious, but what happens when Copilot needs clarification and the back-and-forth becomes a longer conversation? Curious whether Vox supports that kind of multi-turn dialogue or whether it's essentially one-shot voice-to-prompt with no correction loop. Also wondering how it handles things like variable names, file paths, or syntax that's painful to dictate accurately.
@fberrez1ย Great question - it's full multi-turn, not one-shot. The orb stays open across the whole session: you can go back and forth as many times as you want, and if Copilot needs to ask a clarifying question, it just speaks that back and waits for your next turn like a normal conversation. For gnarly variable names/paths, I lean on the transcript panel + typed fallback - you can always type a turn instead of saying it, and typed replies still get read aloud, so it mixes voice and keyboard per-turn rather than forcing pure dictation.
launching Chromium in app mode instead of shipping Electron is such a clean hack, one-line install with no build step because the browser already has the speech APIs. more tools should steal this
the barge-in interrupt is the detail that makes voice actually usable btw, nothing worse than waiting out a wrong answer
@yarslavย Thank you! Yeah, launching Chrome/Edge in app mode was the unlock โ get a real desktop-style window with zero Electron overhead and the Web Speech APIs just work natively. Glad the barge-in landed too, that was the detail I iterated on most.
How does it handle accents or noisy environments in practice, and is the voice model running locally or hitting an external API that could add latency or cost per conversation?
@feyzagpyfย It uses the browser's native Web Speech API (Chrome/Edge), so there's no separate model Vox ships or bills for โ accent/noise handling is whatever your browser's built-in recognizer does, which in Chrome is generally solid but does call out to Google's speech service (not fully on-device), so it needs network. No extra latency/cost from Vox itself though โ zero API keys, zero cloud calls of ours. Definitely room to improve here though โ a local/offline recognition option (e.g. Whisper-based) is on my radar for a future version, especially for noisy environments and stronger accent coverage
@thys_beesmanย Pretty smooth โ sentences are queued and spoken as they stream in (so it starts talking before the full reply arrives), and interrupting is a single tap/Esc that instantly kills both the audio and the in-flight response. Try it โ the "barge-in" is honestly my favorite detail to demo.
Does the orb stay open in the background while I keep coding, or do I have to keep invoking /vox every time I want to switch from typing to talking?
@nisaxvhdย It stays open in the background โ you don't need to re-run ย /voxย each time. Once it's open, just keep coding as normal; tap the orb or hit Space whenever you want to switch to talking, and it goes right back to listening for your session. ย /voxย again only comes into play if you want to switch which session the orb is listening to (it auto-focuses to whichever one last called it) or if it's been closed via ย /vox-stopย .
Me appreciate the simple setup process. Why not include offline support? I think limited offline features would increase reliability.
@alex_bravo1ย Appreciate that! Offline is on my radar โ right now it leans on the browser's native Web Speech API for simplicity/zero-install, but that does need network for recognition. A local/offline mode (likely Whisper-based) would genuinely help reliability in spotty-network or privacy-sensitive setups, so it's a good candidate for a future version.
That's clever. Any plans to support other AI coding assistants beyond GitHub Copilot?
@dhiraj_patel5ย Right now it's built tightly on the Copilot CLI's extension/SDK hooks (that's how it taps into turns, streaming replies, and session state) โ so it's Copilot-specific today, not agent-agnostic. That said, the voice layer itself (mic capture, barge-in, TTS queue) is a self-contained browser front-end, so porting the "wiring" to another agent's extension API is architecturally possible if there's interest โ just not on the roadmap yet.
Voice for coding agents gets compelling when interruption and correction are first-class, not an afterthought. The agent is going to misunderstand file names, symbols, and intent sometimes; the useful workflow is being able to stop it, restate the constraint, and keep the same session alive without touching the keyboard. Nice to see barge-in called out explicitly.
@krekeltronicsย Exactly the philosophy โ barge-in isn't bolted on, it's wired into the core turn loop. Tapping the orb (or hitting Esc) while it's thinking or speaking calls a ย bargeCancel()ย that aborts the in-flight request and stops the TTS queue immediately, so you can cut in, restate the constraint, and keep going in the same session. No waiting out a wrong turn.
The reactive listening orb in its own dedicated window is a really nice touch, keeps the voice interaction feeling like a proper companion rather than just another terminal pane.
@alperen397545ย Thanks! That was very deliberate โ I wanted it to feel like a companion you glance at and talk to, not just another pane competing for attention in your terminal. Launching it as its own chrome-less app-mode window (rather than a browser tab or Electron app) is what makes that possible while still keeping the Web Speech APIs working natively.