Launching today

Hush
Open-source noise suppression for voice AI agents
139 followers
Open-source noise suppression for voice AI agents
139 followers
Hush removes competing voices, background noise, and audio interference from real-time calls so your voice AI agents always hear what matters.




Refocus
The CPU-only, sub-1ms-per-frame number is what jumped out at me. Most enhancement I've tried adds enough latency to break the natural turn-taking on a live call. We build voice AI that phones elderly parents at home, where the hard part is exactly what you describe: a TV going in the background, a spouse talking across the room, sometimes a hearing aid whistling. My question: when the primary speaker is quiet, slurred, or unsteady (pretty common with older users), does isolating them ever clip that softer speech? Planning to test Hush on some of our real call audio.
Hush
@igorgurovich Thanks! That's a great use case. To answer your question: the model applies a gain mask and deep filtering per frame, it doesn't gate or hard-clip. So quieter speech gets enhanced, not cut. That said, we optimized primarily for telephony scenarios with a clearly dominant primary speaker. Slurred or very low-energy speech at low SNR is a harder edge case and I'd honestly want to see how it performs on your specific audio before making promises. Please do test it on your real call data and share what you find. Would love to hear how it holds up, and if there are failure modes with elderly speakers that's exactly the kind of feedback that would shape v2.
Sub-1ms on CPU is the claim that matters most here and also the one I'd want stress-tested. What's the degradation curve? Does it hold at 1ms with a single stream, and what happens at 10 or 50 concurrent calls on commodity hardware? That's the production reality for anyone running voice agents at scale.
The open-source angle is smart for adoption but the real question is where the commercial model sits. Apache 2.0 gets you into production stacks fast. What's the wedge that converts users to paying customers?
Hush
@sergio_jivan Good questions. On concurrency: the Rust runtime shares the compiled ONNX model across all sessions via a single Arc<TypedSimplePlan>. Each additional session allocates only its own frame buffers (a few KB), not a copy of the model. So 50 concurrent streams is 50 independent inference calls on the same ~10 MB model, not 50x memory. On a 4-core machine we've tested, per-frame latency stays around 1ms up to around 40 concurrent streams before you start seeing CPU contention push it higher. It scales linearly with cores.
On the commercial question: Hush is genuinely open source, no "open core" catch. The model and runtime are the product we built for our own voice agent platform at Weya. Open-sourcing it is about closing a gap in the ecosystem that was hurting everyone building in this space, including us. Weya's business is the omnichannel agent platform itself, orchestrating entire workflows using voice, video, and WhatsApp agents, not the noise cancellation layer.
Foyer
Most noise suppression libraries are built for human listeners, where "good enough" means the person on the other end doesn't notice. For voice AI agents the bar is different because the model is doing ASR first, and artifacts that a human brain filters out can wreck transcription accuracy pretty badly. Curious whether Hush is tuned specifically for that ASR pipeline use case or whether it's general-purpose suppression you're applying upstream. Also wondering how it handles near-field keyboard noise and fan hum during long agent sessions, since those tend to be the consistent offenders in real deployments.
Hush
@fberrez1 Great framing, you've nailed exactly why we built this the way we did.
You're right that the bar for voice AI is fundamentally different from human-listener suppression. A human brain is remarkably forgiving of artifacts. An ASR model isn't a subtle spectral smear; over-aggressive suppression can flip a phoneme, and suddenly your agent is acting on the wrong intent. That's a real business failure, not just a quality issue.
Hush is explicitly tuned for the upstream-of-ASR use case. Our eval loop during training measured downstream transcription accuracy (WER), not just perceptual scores like PESQ or DNSMOS. If suppression was introducing artifacts that hurt WER, we treated that as a model failure regardless of how "clean" it sounded to a human ear.
On keyboard noise and fan hum: stationary and near-stationary noise is actually the easier problem — the model handles those well since it was trained heavily on DNS Challenge data, which includes exactly those profiles. Long agent sessions with consistent fan hum are arguably the cleanest scenario Hush faces. Where it earns its keep is when a second human voice enters the frame mid-session, which is what breaks every other model we tested.
Happy to share some WER comparison numbers across noisy conditions if that's useful for your evaluation.
Hush
Hey Product Hunt! I'm @lordhasanali , CEO of weya AI.
We watched great voice AI fail in production, over and over, not because of the model, but because of the audio. Noisy environments, competing voices, background hum. Nobody was solving this properly, so we did.
Introducing Hush, our first in-house open-source speech enhancement model, which:
• Isolates the primary speaker and removes everything else in real time
• Runs entirely on CPU, under 1ms per frame - no GPU needed
• Language-agnostic - works across all spoken languages out of the box
• Apache 2.0 - free to use in production today
We launched at #5 on HuggingFace's Audio-to-Audio leaderboard, and this is just the start.
We'll be here all day answering questions. Try it, break it, and let us know what you think!
Hush
Hey everyone 👋 I'm the maker of Hush. Here's the story behind why we built it.
We build Voice AI at Weya. AI agents that handle live phone calls for businesses. And the #1 issue that kept breaking our pipeline wasn't the LLM, wasn't the TTS. It was background speech.
A caller phones in from a busy restaurant. Their colleague is talking next to them. A TV is blaring in the background. What happens? The background speaker's words get picked up, transcribed, and fed into the AI agent as if the caller said them. The entire conversation derails.
We tried every open-source noise cancellation model out there: DeepFilterNet3, RNNoise, SEGAN, MetricGAN+, DNS Challenge entrants. They all do a great job suppressing stationary noise (fans, traffic, HVAC). But none of them treat a competing human voice as a first-class problem. When the interference is another person speaking, speech looks like speech in every feature these models have learned. They either let it leak through, or they suppress both speakers and destroy intelligibility.
So we built Hush from scratch to fix exactly this.
What it does: Hush removes both background noise AND background speech from live audio, isolating only the primary speaker. It's an 8 MB model that runs fully on CPU in real time (<1 ms per 10 ms frame), at 16 kHz (native telephony sample rate).
How we did it: We extended DeepFilterNet3 with one targeted change: teaching the encoder to distinguish speakers, not just speech from noise.
Training data that reflects the real problem: 60% of our training samples include a competing human speaker mixed in. The model cannot pass training without learning to suppress speech that sounds like speech.
Auxiliary Separation Head: A lightweight Linear(256→32) + Sigmoid head attached to the encoder bottleneck, trained with L1 loss to predict an ERB-domain mask for background speakers. This is a training-only objective. It forces the encoder to carry speaker-discriminative features without adding any inference cost.
Production runtime in Rust: We built libweya_nc, a C-ABI shared library (Rust + tract for ONNX inference) that ships as a ~10 MB .so/.dylib/.dll with no embedded model. It shares compiled model weights across concurrent sessions via Arc<TypedSimplePlan>, so each session costs only a few KB of memory. Plug it into any C, C++, or Python application.
We trained on 10,000+ hours of mixed audio: LibriSpeech, VCTK, Common Voice for clean speech, DNS Challenge + FreeSound + ESC-50 for noise, and MIT IR Survey + OpenAIR for room impulse responses.
Why we open-sourced it: This gap exists because the benchmarks that drive open-source development (DNS Challenge, CHiME) measure noise suppression, not speaker isolation. Models optimized for those benchmarks are not optimized for Voice AI. We want to change that. Every team building voice agents, call centre bots, real-time transcription, or conversational AI systems deserves a model that actually handles the acoustic chaos of real phone calls.
The model, training code, Rust runtime library, and pretrained weights are all on GitHub and Hugging Face. MIT / Apache 2.0 licensed.
We're also fine-tuning a v2 optimized for even louder background noise and speech. Stay tuned.
Would love your feedback. Happy to answer any questions about the architecture, training, or how to integrate it 🙌
Hush
Thanks everyone for the amazing support so far! We're excited to hear your thoughts and answer any questions you have. Your feedback will help shape the future of Hush.
Hush
@princeperspect Glad you like it! Let us know how your experience with Hush.