Matt Navarra

Krisp Voice Translation API - Real-time speech-to-speech translation API

Most voice translation APIs work great in demos. Then real users show up with background noise, accents and verification code that gets garbled. We built our technology on a million live contact center calls where accuracy is non negotiable. 96% accuracy on real calls, zero patient safety incidents, 61+ languages with any to any pair. Translation API is now available self-serve with 60 mins free credit upon signup to dev dashboard.

Add a comment

Replies

Best
Asti Pili
Hey Product Hunt! We've been running real-time voice translation in enterprise contact centers. Healthcare, insurance, finance. Calls where a wrong word means a patient safety incident or a compliance violation. That pressure built an engine most benchmarks can't replicate. 96% accuracy on live calls with real accents and noise. Zero patient safety incidents across 8+ languages. Over a million minutes of production translation. Today we're opening that engine up as a self-serve API. Same model, same accuracy, same 61 languages. Python and JS SDKs, playground with 60 free minutes, custom vocabulary and translation dictionaries from day one. No sales call. If you're building anything where voice crosses a language barrier and accuracy matters, hear it yourself: https://lab.krisp.ai/products/vo... Our team will be here all day. Ask us anything.
Olia Nemirovski

@asti_pili C++ SDK timeline? Any plans for Go, Rust, or mobile SDKs

Asti Pili

@olia_nemirovski The C++ is coming soon but specific date yet.

Saul Fleischman

@asti_pili This is impressive work—the fact that you've handled over a million minutes of production translation in regulated industries with zero incidents is a strong signal of real reliability. The 96% accuracy on live calls with real-world noise is the kind of number that matters way more than lab benchmarks.

Volodymyr Demchenko

OH my, I was looking for something like this for months at this point. Do you have plans to integrate this into your mobile app as a native functionality?

Asti Pili

@volodymyr_demchenko it will definitely come to our consumer product

Volodymyr Demchenko

@asti_pili Amazing, will be waiting for this!

Igor Gurovich

The "works great in demos, then real users show up with background noise and accents" line is exactly the wall we hit building voice AI for older adults. Phone-quality audio and unfamiliar accents break most pipelines that benchmark beautifully. Training on a million real contact-center calls is a smart moat for that reason. One question on the speech-to-speech path: how much added latency does translation introduce over plain transcription, and is it low enough to keep a live call feeling like a natural back-and-forth?

Asti Pili

@igorgurovich good questions.

We measure latency in Krisp as the time to first translated audio after a person speaks.

  • Total time-to-first-translated-audio is approximately 1.5–3 seconds, driven by three factors: context window size, source language structural complexity, and amount of speech. AI inference latency is around 700-800ms here.

  • Language structural complexity is the primary variable. Languages with word order parallel to the target language (e.g., Spanish) can be translated incrementally as words arrive, resulting in latency toward the lower end.

  • Languages with high reordering distance — such as Japanese, Korean, or Turkish — are verb-final or agglutinative, requiring the model to buffer more context before producing a grammatically correct translation, resulting in latency toward the higher end

  • The AI latency difference between transcription and translation is ~100ms.

Jolene

Impressive to see accuracy claims based on real contact center traffic instead of lab conditions. How does the API handle industry-specific terminology, like healthcare or financial services vocabulary, where a single mistranslation can create major issues?

Lusine Mnatsakanyan

Looking forward to seeing what the Product Hunt community builds with it. We'd love your feedback!

Matt Navarra

Great to see this finally launched. Super useful update to a great tool!

Thami Benjelloun

Congratulations! I will definitely try it.

Davit Baghdasaryan

So proud of this launch!

Gaurav Aroraa

Real-time speech-to-speech at API level means you've solved the three-stage pipeline problem: ASR accuracy, translation context, and TTS naturalness all simultaneously. We've built on streaming audio APIs and the hardest part is always mid-utterance interruptions breaking the translation context. What's your P99 latency for a 10-second utterance, and how do you handle speaker turn overlap?

Asti Pili

@retain_dev Great questions — you're clearly speaking from experience with the same failure modes we've worked through.

On latency: We don't benchmark against utterance length, and that's deliberate. A 10-second utterance never waits for completion — it's streamed and translated as a series of segments, so utterance-level P99 would mostly measure how long the speaker talked, not how fast the system is. The segment-level numbers are what matter: typical segments run 1.5–2 seconds of speech, with translation delay averaging under a second and staying under ~1 second even at P95. In practice, listeners hear translated audio continuously throughout a long utterance rather than waiting for it to end.

On mid-utterance interruptions: Our segmentation is context-aware rather than purely acoustic. When the model judges that the audio received so far is insufficient for a reliable translation, it waits — within a configurable threshold — for more input before committing. If nothing more arrives, it emits the best translation from accumulated context. This bounds the quality/responsiveness trade-off and avoids the context fragmentation you're describing, where an interruption forces a translation of a half-formed thought.

On speaker turn overlap: The API is designed as a single semi-synchronous translation stream, so overlap handling stays with the client — you decide whether overlapping speakers get separate streams or how to arbitrate the floor. We're adding an explicit interrupt command so clients can control what happens to in-flight translation and synthesis when a turn is cut off, rather than us guessing at a policy that won't fit every application.

Alina Tyslenok

Congrats on the launch! 🚀 Real-time voice translation is impressive on its own, but production experience in healthcare and finance makes it even more compelling. Best of luck today!

12
Next
Last