
Hush
Open-source noise suppression for voice AI agents
339 followers
Open-source noise suppression for voice AI agents
339 followers
Hush removes competing voices, background noise, and audio interference from real-time calls so your voice AI agents always hear what matters.




Mailwarm
What’s the latency like in real time calls, and does it ever clip or distort the speaker’s voice?
Hush
@thamibenjelloun Total pipeline latency sits around 12-13ms, including buffering and resampling, imperceptible on a live call.
On clipping: the model applies a per-frame gain mask rather than hard gating, so it enhances quieter speech rather than cutting it off. Distortion artifacts are something we specifically optimized against since we measured downstream ASR accuracy (WER), not just how it sounds to a human ear.
That said, the best way to know is to run it on your own audio. Weights are on GitHub, Apache 2.0. Would love to hear what you find.
Hush
Hey everyone 👋 I'm the maker of Hush. Here's the story behind why we built it.
We build Voice AI at Weya. AI agents that handle live phone calls for businesses. And the #1 issue that kept breaking our pipeline wasn't the LLM, wasn't the TTS. It was background speech.
A caller phones in from a busy restaurant. Their colleague is talking next to them. A TV is blaring in the background. What happens? The background speaker's words get picked up, transcribed, and fed into the AI agent as if the caller said them. The entire conversation derails.
We tried every open-source noise cancellation model out there: DeepFilterNet3, RNNoise, SEGAN, MetricGAN+, DNS Challenge entrants. They all do a great job suppressing stationary noise (fans, traffic, HVAC). But none of them treat a competing human voice as a first-class problem. When the interference is another person speaking, speech looks like speech in every feature these models have learned. They either let it leak through, or they suppress both speakers and destroy intelligibility.
So we built Hush from scratch to fix exactly this.
What it does: Hush removes both background noise AND background speech from live audio, isolating only the primary speaker. It's an 8 MB model that runs fully on CPU in real time (<1 ms per 10 ms frame), at 16 kHz (native telephony sample rate).
How we did it: We extended DeepFilterNet3 with one targeted change: teaching the encoder to distinguish speakers, not just speech from noise.
Training data that reflects the real problem: 60% of our training samples include a competing human speaker mixed in. The model cannot pass training without learning to suppress speech that sounds like speech.
Auxiliary Separation Head: A lightweight Linear(256→32) + Sigmoid head attached to the encoder bottleneck, trained with L1 loss to predict an ERB-domain mask for background speakers. This is a training-only objective. It forces the encoder to carry speaker-discriminative features without adding any inference cost.
Production runtime in Rust: We built libweya_nc, a C-ABI shared library (Rust + tract for ONNX inference) that ships as a ~10 MB .so/.dylib/.dll with no embedded model. It shares compiled model weights across concurrent sessions via Arc<TypedSimplePlan>, so each session costs only a few KB of memory. Plug it into any C, C++, or Python application.
We trained on 10,000+ hours of mixed audio: LibriSpeech, VCTK, Common Voice for clean speech, DNS Challenge + FreeSound + ESC-50 for noise, and MIT IR Survey + OpenAIR for room impulse responses.
Why we open-sourced it: This gap exists because the benchmarks that drive open-source development (DNS Challenge, CHiME) measure noise suppression, not speaker isolation. Models optimized for those benchmarks are not optimized for Voice AI. We want to change that. Every team building voice agents, call centre bots, real-time transcription, or conversational AI systems deserves a model that actually handles the acoustic chaos of real phone calls.
The model, training code, Rust runtime library, and pretrained weights are all on GitHub and Hugging Face. MIT / Apache 2.0 licensed.
We're also fine-tuning a v2 optimized for even louder background noise and speech. Stay tuned.
Would love your feedback. Happy to answer any questions about the architecture, training, or how to integrate it 🙌
sub-1ms-per-frame on cpu is the easy number to benchmark — on real-time agent pipelines the harder thing is frame jitter compounding across the STT→LLM→TTS hop, where any lookahead the suppressor needs eats the latency budget you saved. gain-mask + deep filtering also has the target-speaker vs blind-separation question lurking when two voices overlap mid-utterance.
Hush
Thanks everyone for the amazing support so far! We're excited to hear your thoughts and answer any questions you have. Your feedback will help shape the future of Hush.
Hush
@princeperspect Glad you like it! Let us know how your experience with Hush.
Clean input audio is half the battle for voice agents and most teams underrate it. Open-sourcing it is generous. Will be poking at the repo. Congrats on the launch Atul!
Hush
@eitan_elnekave Thanks! You're absolutely right that audio quality is often the hidden bottleneck in voice agents. Really appreciate the kind words and interest in the repo. Looking forward to seeing what you find!
Tough Tongue AI
This seems pretty useful. We would love to give it a try!
Hush
@aj_123 🙌🏻