Launching today
Tyto is a lightweight model that runs on your audio stream and predicts whether the audio reaching your agent will cause downstream failures. It outputs a single score plus a breakdown across six dimensions: noise, speaker reverb, speaker loudness, interfering speech, background media speech, packet loss. Try it here: https://ai-coustics-tyto-demo--ph.modal.run/








Hi everyone, I'm Fabian, co-founder at ai-coustics.
The launch page covers very well what our newest product does but let me add here why we built it.
Voice agents are moving into the real world, where audio is messy: think cars, call centers, kitchens, noisy streets. And in the real world they fail in ways the transcript never shows. Especially competing voices - like a TV in the background or a far-field speaker - and artifacts like packet loss on a bad connection can throw off agent's performance. Teams see the bad outcome but have no idea the cause was audio. That blind spot is where trust in production voice AI breaks down.
Tyto (Audio Insight) is the solution for that blind spot. It analyses the input audio and scores it: how noisy, how reverberant, how much interfering speech, how likely the agent is to mishear. It's a signal you can actually act on.
It works in two modes: real-time monitoring so you catch failing calls as they happen or adjust the agent flow, and post-call analysis so you can finally answer what went wrong. And it runs on the same on-device ai-coustics SDK that's already shipped in production by voice AI teams.
Easiest way to feel it: point Tyto at a recording of one of your worst calls and watch it highlight exactly where the audio fell apart. Full write-up here. Link to documentation here.
We built this for the people shipping voice agents into hard environments. Tell us what's missing, we're reading every comment. ๐
Congrats on the launch! I'm curious what specific agent flow adjustments can Tyto trigger automatically when it detects interfering speech mid-conversation?
@crystalmeiย TLDR: Tyto gives you the raw metrics on how much interfering speech there is and gives Voice AI builders the flexibility to threshold those values and emit tags, which can be propagated to the LLM or voice agent to intervene.
Deeper Dive: Tyto outputs two interfering speech quality metrics respectively for in-the-room interfering speakers and devices playing content containing speech. They are both numerical values that go from 0 (clean audio) to 1 (lots of interfering speech). Crucially they are agent agnostic to give builders control over how they leverage them.
The flow we would recommend is to run Tyto over your user audio and threshold the interfering speech metrics at 0.35 (medium) and 0.6 (poor). These bands can then be used realtime to propagate information (e.g. textual tags like "High Background Speech" or "TV/radio/device detected") to your Voice Agent or LLM.
You can also preemptively flush the agent's turn when the threshold is exceeded and have it tell the user to move somewhere quieter. We've seen use cases like these with some customers, which is pretty cool.
Hope that helps! You can find lots more information in the Tyto guide in the ai-coustics docs :)
Looks great! Does running on-device add latency to the live call, or is the scoring free?
@louislecatย In the 'real-time' mode, Tyto scans chunks of audio (depends how you set it, but for example 5 seconds progressively) and sends the scores after analyzing. It doesn't add latency to the call itself.
Straighty.app
It basically feels like the Voice AI agents are not deaf anymore! I think there was some degree of the Audio Intelligence in some of the STT engines such as understanding certain sounds etc but the acoustics awareness is a whole new level!
Hey hey, Mila from ai-coustics here - I'm looking forward to seeing what you all think!
If you want to try it in your own infrastructure right away, you can get your SDK key here ๐
You can find full docs here: https://docs.ai-coustics.com/