Vox - Voice in, voice out โ€” with GitHub Copilot

byโ€ข
Vox is a GitHub Copilot CLI extension: run /vox and a reactive listening orb opens in its own window. Speak your turn, hear the agent reply. Voice in, voice out โ€” on Windows, macOS, and Linux.

Add a comment

Replies

Best
Hunter
๐Ÿ“Œ

Hey Product Hunt ๐Ÿ‘‹ I'm the maker of Vox. I use GitHub Copilot constantly and got tired of being pinned to the keyboard, so I built a way to just talk to it. Run /vox and a reactive orb opens in its own window โ€” you speak your turn, the session hears it, and the reply is read back. Voice in, voice out. You can barge in by voice to interrupt and correct it, there are live captions and a transcript, and it even reads your typed replies aloud. It works in the Copilot CLI and inside the Copilot app. It's pure JavaScript with no build step โ€” it uses the browser's Web Speech APIs by launching Chromium in app mode instead of shipping Electron โ€” so it installs in one line on Windows/macOS/Linux. Free and open source (MIT). I started it as an accessibility-minded experiment (a hands-free way to drive an agent), so I'd especially love feedback on the voice timing and the interrupt flow. Ask me anything!

Homepage: ยท Code:

The voice input part is straightforward enough, but the interesting question is how well it handles the parts of coding where spoken intent gets ambiguous fast. Saying "refactor that function" out loud works fine when context is obvious, but what happens when Copilot needs clarification and the back-and-forth becomes a longer conversation? Curious whether Vox supports that kind of multi-turn dialogue or whether it's essentially one-shot voice-to-prompt with no correction loop. Also wondering how it handles things like variable names, file paths, or syntax that's painful to dictate accurately.

ย Great question - it's full multi-turn, not one-shot. The orb stays open across the whole session: you can go back and forth as many times as you want, and if Copilot needs to ask a clarifying question, it just speaks that back and waits for your next turn like a normal conversation. For gnarly variable names/paths, I lean on the transcript panel + typed fallback - you can always type a turn instead of saying it, and typed replies still get read aloud, so it mixes voice and keyboard per-turn rather than forcing pure dictation.

launching Chromium in app mode instead of shipping Electron is such a clean hack, one-line install with no build step because the browser already has the speech APIs. more tools should steal this

the barge-in interrupt is the detail that makes voice actually usable btw, nothing worse than waiting out a wrong answer

ย Thank you! Yeah, launching Chrome/Edge in app mode was the unlock โ€” get a real desktop-style window with zero Electron overhead and the Web Speech APIs just work natively. Glad the barge-in landed too, that was the detail I iterated on most.

How does it handle accents or noisy environments in practice, and is the voice model running locally or hitting an external API that could add latency or cost per conversation?

ย It uses the browser's native Web Speech API (Chrome/Edge), so there's no separate model Vox ships or bills for โ€” accent/noise handling is whatever your browser's built-in recognizer does, which in Chrome is generally solid but does call out to Google's speech service (not fully on-device), so it needs network. No extra latency/cost from Vox itself though โ€” zero API keys, zero cloud calls of ours. Definitely room to improve here though โ€” a local/offline recognition option (e.g. Whisper-based) is on my radar for a future version, especially for noisy environments and stronger accent coverage

I Love the idea of talking to Copilot, how smooth is the voice flow when you interrupt or correct mid conversation?

ย Pretty smooth โ€” sentences are queued and spoken as they stream in (so it starts talking before the full reply arrives), and interrupting is a single tap/Esc that instantly kills both the audio and the in-flight response. Try it โ€” the "barge-in" is honestly my favorite detail to demo.

Does the orb stay open in the background while I keep coding, or do I have to keep invoking /vox every time I want to switch from typing to talking?

ย It stays open in the background โ€” you don't need to re-run ย /voxย  each time. Once it's open, just keep coding as normal; tap the orb or hit Space whenever you want to switch to talking, and it goes right back to listening for your session. ย /voxย  again only comes into play if you want to switch which session the orb is listening to (it auto-focuses to whichever one last called it) or if it's been closed via ย /vox-stopย .

Me appreciate the simple setup process. Why not include offline support? I think limited offline features would increase reliability.

ย Appreciate that! Offline is on my radar โ€” right now it leans on the browser's native Web Speech API for simplicity/zero-install, but that does need network for recognition. A local/offline mode (likely Whisper-based) would genuinely help reliability in spotty-network or privacy-sensitive setups, so it's a good candidate for a future version.

That's clever. Any plans to support other AI coding assistants beyond GitHub Copilot?

ย Right now it's built tightly on the Copilot CLI's extension/SDK hooks (that's how it taps into turns, streaming replies, and session state) โ€” so it's Copilot-specific today, not agent-agnostic. That said, the voice layer itself (mic capture, barge-in, TTS queue) is a self-contained browser front-end, so porting the "wiring" to another agent's extension API is architecturally possible if there's interest โ€” just not on the roadmap yet.

Voice for coding agents gets compelling when interruption and correction are first-class, not an afterthought. The agent is going to misunderstand file names, symbols, and intent sometimes; the useful workflow is being able to stop it, restate the constraint, and keep the same session alive without touching the keyboard. Nice to see barge-in called out explicitly.

ย Exactly the philosophy โ€” barge-in isn't bolted on, it's wired into the core turn loop. Tapping the orb (or hitting Esc) while it's thinking or speaking calls a ย bargeCancel()ย  that aborts the in-flight request and stops the TTS queue immediately, so you can cut in, restate the constraint, and keep going in the same session. No waiting out a wrong turn.

The reactive listening orb in its own dedicated window is a really nice touch, keeps the voice interaction feeling like a proper companion rather than just another terminal pane.

ย Thanks! That was very deliberate โ€” I wanted it to feel like a companion you glance at and talk to, not just another pane competing for attention in your terminal. Launching it as its own chrome-less app-mode window (rather than a browser tab or Electron app) is what makes that possible while still keeping the Web Speech APIs working natively.