Launching today

Introducing Parrot: Ringg’s speech-to-text model for production-grade voice agents. Capture Hindi-heavy and noisy real-world conversations with low-latency inference, stronger transcript quality, and Hindi validation built for downstream workflows.




Hey Product Hunt 👋
Thrilled to introduce Parrot, Ringg’s speech-to-text model built for production-grade voice agents.
Most STT models do well on clean audio. Voice agents don’t get clean audio. They deal with compressed phone calls, Hindi-English code-switching, Indian accents, background noise, and conversations where one misheard word can break the next action.
What makes it different:
🦜 Built for real world calls
🦜 Low latency inference for smoother voice agent conversations
🦜 Hindi validation and normalization for cleaner downstream workflows
🦜 Strong Normalised WER performance on open-source Hindi benchmarks
For teams building voice agents, Parrot helps turn messy speech into cleaner transcripts that LLMs can actually use.
Try it out and let us know what you're building with it!
@itsmeparth can't find a better model for my Indian customers.
Are you also working on European languages [Spanish, German ?] or if its coming soon..
Yes, absolutely.
Right now Parrot is focused on Indian conversations but European languages are on the roadmap, including Spanish and German.
We’re starting with the languages where we see the highest production demand, and then expanding coverage from there.
Congrats on the launch! Building voice sessions into a couples app right now (currently on Deepgram for streaming transcription), so the "voice agents don't get clean audio" framing really lands...clean-audio benchmarks oversell every STT model until you hit a real room. One thing I've run into that I'd love your take on: the hardest case isn't accent or noise, it's two people talking, overlapping speech, interruptions, one person finishing the other's sentence. Most STT degrades badly there. Is Parrot tuned mainly for the single-caller voice-agent case (one human, one agent), or does it hold up on genuine multi-speaker conversations? Curious whether that's a roadmap item or a deliberate scope line.
This looks really solid 🔥
Curious about latency and how it performs in noisy real-world calls compared to Whisper.
Thanks Vasyl!!
Whisper is excellent as a general-purpose ASR model, especially for offline and batch transcription.
Parrot is optimized more specifically for production voice agents: streaming calls, low end-of-speech to final transcript latency, and messy real-world audio where the transcript needs to trigger the next action.
We’re also benchmarking on noisy call conditions and Hindi-heavy conversations, not just clean audio. Whisper is not specifically optimized for Indian accents, and its latency can be higher for real-time voice-agent use cases.
Private Resume Builder
Haha, how can something be this useful and this scary simultaneously!? As someone with a name most humans can't spell right, I look forward to the day when this is no longer an issue.
Exactly! Names are where STT gets very real very fast.
A big part of Parrot’s focus is making these real-world details more reliable, especially in Indian conversations.
Building a dedicated validation layer for Hindi downstream workflows is clever. Most generic STT APIs fall apart on code-switching and regional accents. We've hit similar walls where raw transcripts were too noisy for reliable intent parsing in production pipelines. How do you handle Hinglish code-switching, and what's the P95 latency on a 10-second audio chunk?
Congratulation on the launch! Btw, when I mix English with Hindi, I observed its little biased towards transcribing English in Hindi (using Devnagri glyph). Latency is impressive
Just tried this out, amazing speed and accuracy. Great work!
Thanks Vedant, really appreciate you trying it out!
Speed + accuracy was the core goal for us because voice agents need both. A transcript has to be right, but it also has to arrive fast enough to keep the conversation natural.