Voice AI for Meetings

Start new thread

Krisp Voice Translation API - Real-time speech-to-speech translation API

Anchor

•2mo ago

Most voice translation APIs work great in demos. Then real users show up with background noise, accents and verification code that gets garbled. We built our technology on a million live contact center calls where accuracy is non negotiable. 96% accuracy on real calls, zero patient safety incidents, 61+ languages with any to any pair. Translation API is now available self-serve with 60 mins free credit upon signup to dev dashboard.

Replies

Best

Stripo.email

Congrats on the launch! 🚀 Real-time voice translation is impressive on its own, but production experience in healthcare and finance makes it even more compelling. Best of luck today!

Report

2mo ago

How well does Custom Vocabulary / Dictionary work? How many terms can I add, and does it slow things down?

Report

2mo ago

Krisp

Maker

@marija_pojasnikova The API supports custom vocabulary, so you can pass your specific terminology. We also support a custom translation_dictionary, where you can provide an exact word and its translation for each language — that word will always keep your translation.
Here's the documentation on how to pass these parameters: https://sdk-docs.krisp.ai/docs/voice-translation-api#initial-client-message

Report

2mo ago

Krisp

Maker

@marija_pojasnikova Both the vocabulary and the dictionary can contain up to 200 entries each

Report

2mo ago

Product Hunt

Krisp launches go wayyyy back. Congrats on the latest. :)

Report

2mo ago

Krisp

Maker

@rrhoover yeah, way back to those working from home days during covid - when we launched Noise Cancellation.

Report

2mo ago

Interesting breakdown of the latency tradeoffs. The language reordering problem is something many demos conveniently avoid discussing. I was wondering how you're handling workload spikes when multiple streams require larger context windows simultaneously. Do you dynamically allocate translation capacity per stream, or is there some form of queueing and prioritization to prevent latency from cascading across tenants?

Report

2mo ago

Krisp

Maker

@apexbackene6x You're right that this is where a lot of real-time translation systems fall over — the failure mode is usually context accumulation, where long sessions inflate per-stream memory and a burst of concurrent streams turns into cascading queue delay.

We sidestep most of that architecturally. Translation context windows are deliberately narrow — scoped to what's needed for translation continuity across the current segment boundary, not extended conversation history. Unlike serving a general-purpose LLM, per-session footprint stays small and roughly constant over the life of a stream, so "many streams simultaneously needing larger context" isn't a spike vector in the first place. Sessions are also fully isolated: one tenant's workload spike can't bleed into another tenant's latency.

On capacity itself, scaling is horizontal and automatic against concurrent session load — there's no manual provisioning step, and no shared queue where one tenant's burst pushes everyone else back. For customers with strict latency requirements, we additionally offer reserved capacity with guaranteed compute allocation, so your streams are served from dedicated headroom rather than competing in the general pool.

So short answer: per-stream allocation, kept cheap by design, with isolation rather than prioritization doing the cascade-prevention work — and reservation available where best-effort isn't acceptable.

Report

2mo ago

The custom translation_dictionary that locks an exact term-to-term mapping is the detail I'd actually reach for — brand and product names are usually the first thing these pipelines mangle. On the free 60-min tier, is the dictionary per-request or stored server-side once you set it up?

Report

2mo ago

Krisp

Maker

@lennoxbeflying you pass it as params here is the documentation https://sdk-docs.krisp.ai/docs/voice-translation-api#initial-client-message

Report

2mo ago

Humalike

Following Krisp: Voice AI for Meetings with interest. What is next on the roadmap after launch day?

Report

2mo ago

Krisp

Maker

@borrellbr Voice (sky) is the limit :D
if I am serious probably translation will be next landing to voice ai app

Report

2mo ago

Real-time speech-to-speech is the hard part - what's the round-trip latency at acceptable quality? Most translation APIs I've tested hit 800ms+ which is fine for async but breaks conversational flow completely.

Report

2mo ago

Krisp

Maker

@christian_knaut We measure latency in Krisp as the time to first translated audio after a person speaks.

Total time-to-first-translated-audio is approximately 1.5–3 seconds, driven by three factors: context window size, source language structural complexity, and amount of speech. AI inference latency is around 700-800ms here.
Language structural complexity is the primary variable. Languages with word order parallel to the target language (e.g., Spanish) can be translated incrementally as words arrive, resulting in latency toward the lower end.
Languages with high reordering distance — such as Japanese, Korean, or Turkish — are verb-final or agglutinative, requiring the model to buffer more context before producing a grammatically correct translation, resulting in latency toward the higher end
The AI latency difference between transcription and translation is ~100ms.

Report

2mo ago

Spotlight by Backplanes

@asti_pili tangential question-- how does this work with speaker attribution when multiple people are in a room together? This is my biggest pet peeve with most transcription agents today. Granola's amazing if I use my phone, and has nothing when using my laptop. My kingdom for good attribution regardless of setting! What does Krisp do here?

Report

2mo ago

Krisp

Maker

@antifreeze the is a translation API launch from our developer product. You are referring to our voice ai app for meetings. To answer your question speaker attribution is tough challenge. Currently we use AI to suggest speakers based on transcription.

Report

2mo ago

Huge fan of Krisp's noise cancellation, and seeing you guys cross the language line with a Dev API is massive! For developers looking to integrate this translation API into real-time voice apps (like live customer support), what is the average latency (in milliseconds) we should expect for the live translation stream?

Report

2mo ago

Krisp

Maker

@dropa We measure latency in Krisp as the time to first translated audio after a person speaks.

Total time-to-first-translated-audio is approximately 1.5–3 seconds, driven by three factors: context window size, source language structural complexity, and amount of speech. AI inference latency is around 700-800ms here.
Language structural complexity is the primary variable. Languages with word order parallel to the target language (e.g., Spanish) can be translated incrementally as words arrive, resulting in latency toward the lower end.
Languages with high reordering distance — such as Japanese, Korean, or Turkish — are verb-final or agglutinative, requiring the model to buffer more context before producing a grammatically correct translation, resulting in latency toward the higher end
The AI latency difference between transcription and translation is ~100ms.

Report

2mo ago

Humalike

Congrats on the launch! How do you plan to handle all the traffic during these days?

Report

2mo ago

1 2