Launching today

Introducing Parrot: Ringg’s speech-to-text model for production-grade voice agents. Capture Hindi-heavy and noisy real-world conversations with low-latency inference, stronger transcript quality, and Hindi validation built for downstream workflows.




Hey Product Hunt 👋
Thrilled to introduce Parrot, Ringg’s speech-to-text model built for production-grade voice agents.
Most STT models do well on clean audio. Voice agents don’t get clean audio. They deal with compressed phone calls, Hindi-English code-switching, Indian accents, background noise, and conversations where one misheard word can break the next action.
What makes it different:
🦜 Built for real world calls
🦜 Low latency inference for smoother voice agent conversations
🦜 Hindi validation and normalization for cleaner downstream workflows
🦜 Strong Normalised WER performance on open-source Hindi benchmarks
For teams building voice agents, Parrot helps turn messy speech into cleaner transcripts that LLMs can actually use.
Try it out and let us know what you're building with it!
@itsmeparth can't find a better model for my Indian customers.
Are you also working on European languages [Spanish, German ?] or if its coming soon..
Yes, absolutely.
Right now Parrot is focused on Indian conversations but European languages are on the roadmap, including Spanish and German.
We’re starting with the languages where we see the highest production demand, and then expanding coverage from there.
Congrats on the launch! Building voice sessions into a couples app right now (currently on Deepgram for streaming transcription), so the "voice agents don't get clean audio" framing really lands...clean-audio benchmarks oversell every STT model until you hit a real room. One thing I've run into that I'd love your take on: the hardest case isn't accent or noise, it's two people talking, overlapping speech, interruptions, one person finishing the other's sentence. Most STT degrades badly there. Is Parrot tuned mainly for the single-caller voice-agent case (one human, one agent), or does it hold up on genuine multi-speaker conversations? Curious whether that's a roadmap item or a deliberate scope line.
Thanks @ferdi_sigona this is a very real point.
Parrot is primarily tuned today for the single caller voice agent case: one human speaking to one agent, with interruptions, short turns, and messy call audio.
Multi-speaker conversation with overlapping speech is a genuine problem to solve. Parrot handles some interruption patterns including background human speech, but full multi-speaker diarization and overlap handling is a roadmap item rather than something we’d overclaim today.
The scope is deliberate: first make real-time voice-agent calls reliable, then expand deeper into multi-speaker scenarios.
This looks really solid 🔥
Curious about latency and how it performs in noisy real-world calls compared to Whisper.
Thanks Vasyl!!
Whisper is excellent as a general-purpose ASR model, especially for offline and batch transcription.
Parrot is optimized more specifically for production voice agents: streaming calls, low end-of-speech to final transcript latency, and messy real-world audio where the transcript needs to trigger the next action.
We’re also benchmarking on noisy call conditions and Hindi-heavy conversations, not just clean audio. Whisper is not specifically optimized for Indian accents, and its latency can be higher for real-time voice-agent use cases.
Building a dedicated validation layer for Hindi downstream workflows is clever. Most generic STT APIs fall apart on code-switching and regional accents. We've hit similar walls where raw transcripts were too noisy for reliable intent parsing in production pipelines. How do you handle Hinglish code-switching, and what's the P95 latency on a 10-second audio chunk?
Thanks @retain_dev , exactly! Raw STT output is often not enough once it has to drive intent parsing or downstream workflows.
For Hinglish, Parrot is trained on code-mixed speech and uses Hindi-aware tokenisation plus a normalisation layer, so the output stays cleaner before it reaches the LLM or API.
On P95 latency for a 10-second chunk, we’re finalising the published benchmark setup and don’t want to quote a loose number without the test conditions.
In real-world voice-agent calls, audio usually does not arrive as one fixed 10-second block. Parrot can segment longer audio into shorter chunks, which helps return responses faster and keeps turn-taking more natural.
Our AI has a voice mode. We use ChatGPT (before that, we used speech-to-text recognition and then text-to-speech). How is your service better? Is it only better Hindi recognition, or is there something else?
Thanks @natalia_iankovych , fair question.
Hindi recognition is definitely a big focus but not the only one. Parrot is also optimized for low-latency transcription, code-mixed conversations, cleaner normalized output, and real-world call audio where transcripts are expected to trigger the next action.
Happy to set up a demo as well to understand your use case better and explore how Parrot can help solve it.
Congratulation on the launch! Btw, when I mix English with Hindi, I observed its little biased towards transcribing English in Hindi (using Devnagri glyph). Latency is impressive
Thanks @ashishkingdom ! That’s a fair observation.
For code-mixed conversations where the dominant language is Hindi, this can happen but when English is the dominant language, it should work as expected.
This is one of the expected behaviours.
Private Resume Builder
Haha, how can something be this useful and this scary simultaneously!? As someone with a name most humans can't spell right, I look forward to the day when this is no longer an issue.
Exactly! Names are where STT gets very real very fast.
A big part of Parrot’s focus is making these real-world details more reliable, especially in Indian conversations.