Badges



Maker History
Forums
Sokuji — Privacy-first AI voice translator that runs right in your browser, no app install needed
Hi Product Hunt
I'm building Sokuji, an open-source real-time AI voice translation tool. Speak your language in a video call, and your words are translated and spoken in the other person's language live.
Why Sokuji?
Privacy-first, local-first.
Sokuji can run the entire speech recognition translation text-to-speech pipeline 100% locally in your browser using WebAssembly and WebGPU. No cloud, no API keys, no data leaving your device ever. Your conversations stay yours. Cloud providers (OpenAI, Gemini, etc.) are available as options, not requirements.
Just install a browser extension. That's it.
No desktop app download, no signup, no setup wizard. Install the Chrome/Edge extension, open your Google Meet / Teams / Zoom call, and start translating. It works in the browser sidebar right where your meeting already is. (A desktop app with system audio capture is also available for power users, but most people won't need it.)
---
What it can do
Offline local inference
- 12+ ASR models (SenseVoice, Whisper WebGPU, Paraformer, Parakeet TDT...)
- 49 Opus-MT translation pairs + Qwen 3.5 for multilingual translation
- Multiple TTS engines (Piper, Matcha, MeloTTS)
- Smart model variant selection based on your GPU capabilities
7+ cloud providers (optional)
OpenAI Realtime (GA, WebRTC), Google Gemini, Volcengine (ByteDance), Palabra AI, any OpenAI-compatible endpoint, or our hosted Kizuna AI option. Bring your own API key if you want lower latency or higher quality.
Modern audio pipeline
AudioWorklet-based low-latency processing, built-in noise suppression, dynamic device switching mid-session, cross-platform system audio capture, virtual audio devices on macOS/Windows.
35+ UI languages the app itself speaks your language too.
Other highlights: text input translation, karaoke-style TTS highlighting, Push-to-Talk, auto-updater (desktop), Stripe token wallet for hosted API.
---
What's next
The local-first approach is working, but we want to push it much further:
- TranslateGemma 4B Google's 55+ language any-to-any translation model via WebGPU, replacing current English-pivot Opus-MT limitation ([#123](https://github.com/kizuna-ai-lab...))
- Voxtral Mini 4B Mistral's 13-language real-time streaming ASR, <500ms latency ([#125](https://github.com/kizuna-ai-lab...))
- Native inference in Electron bypass WASM overhead, unlock GPU acceleration for significantly faster local performance ([#129](https://github.com/kizuna-ai-lab...))
- More meeting platforms: Webex, Jitsi, GoTo, RingCentral
- Windows & macOS code signing, Linux AppImage + Flatpak
---
Links
- Website: https://sokuji.kizuna.ai
- GitHub: https://github.com/kizuna-ai-lab...
- Chrome Web Store: https://chromewebstore.google.co...
- Desktop download (optional): https://github.com/kizuna-ai-lab...
Built by Kizuna AI Lab. Feedback, issues, and PRs welcome. If you deal with multilingual meetings or care about keeping your conversations private, I'd love to hear from you.
🚀 Sokuji Now Supports Gemini 2.5 Flash & 2.0 Flash Live APIs (Beta)
Real-time AI translation powered by Google's latest Gemini models - now in beta testing!
We're excited to announce that Sokuji now supports Google's cutting-edge Gemini APIs in beta:
New Beta Features:
Sokuji browser extension now supports Google Meet, Microsoft Teams, and Zoom!
It s been amazing to see how Sokuji helps break down language barriers in online meetings, letting you speak in your own language while others hear real-time voice-to-voice translations. No more typing or relying on subtitles alone!
That said, we re still early in the journey and really need more user feedback. If you ve tried it out (or want to!), I d love to hear how it s working for you and what improvements or features would make it even better.
Thanks for all the support so far, and looking forward to your thoughts!




