If you use Aqua Voice, WisprFlow or SuperWhisper, you know the drill: talk to a wall, hit stop, pray the AI didn't hallucinate.
Nobody does live transcription because it completely breaks standard AI models. We spent months figuring out local agreement, audio buffering, and aggressive real-time self-correction just so you can finally edit at the speed of thought.
Live transcription is not final transcription, but faster. It is a trust problem.
When someone is still speaking, the model is decoding partial audio: clipped phonemes, silence, background noise, half-finished words, and sentences that may still change direction.
Juno
Hey Product Hunt - Jaski here, developer behind Juno.
Voice is becoming the new keyboard. Juno is the open voice layer for Mac.
I built it for how I actually work: long prompts, product notes, specs, messages, emails. I wanted to speak naturally, see the transcript live, and get finished writing inside the app I was already using.
Juno turns messy speech into clean writing, rewrites selected text by voice, and creates Notes, Reminders, and Alarms.
If software hears your voice and understands your screen, it has to be local, open, and unlimited. For this category, open source is not a feature. It is the only acceptable architecture.
Hence, Juno is fully local, open source, and free forever.
Use Juno and share your feedback with me. https://usejuno.co/
If you are a tech nerd, here is how the best voice to text app has been built - Inside Juno.
@jas_jaski Excited to hunt Juno today! Many congratulations on launching :)
Jas reached out yesterday to showcase the product, and I was eager to learn more about what really differentiates it. He shared that Juno offers commands, snippets, live transcriptions, and is completely free and local.
All of this made me excited to hunt it, and I'm looking forward to seeing how Juno might incorporate features similar to @Typeahead in the future.
Juno
@rohanrecommends Thanks for the support and the hunt, your advice yesterday was spot on. I'm pumped for the community to try the live transcriptions. It was a huge technical hurdle to make it happen locally, which I wrote about here: https://cassiniresearch.com/products/juno/blog/why-live-transcriptions-are-hard.html
@jas_jaski Nice launch, Jaski, congrats. Curious about the local/open design. What trade-offs did you make to keep everything fully local, and how should a user decide whether Juno’s local approach is the right fit for their workflow versus a cloud-based voice tool?
Juno
@swati_paliwal Thanks Swati!
Honestly, the goal was to not trade anything away on quality - that was the hard part. Most voice tools are smart because they lean on the cloud; the fully local ones are private but weaker. We refused to pick. So we built the entire speech stack to run on-device on Apple silicon - it hears you, works out what you meant, and rewrites it clean, all on your Mac, in real time. It's the only one doing live transcription fully local, it works offline (plane, no wifi, anywhere), nothing you say ever leaves the machine, and it's free and open source.
If you want top-tier accuracy, live transcriptions and you refuse to hand over your data to a server or pay monthly subscriptions, Juno is the clear choice.
We spent weeks engineering this - the full how is here: https://cassiniresearch.com/products/juno/blog/
The "local" part is what actually matters here, and I'm curious how far it goes. Is the speech-to-text model running fully on-device with no outbound calls at all, or does "local" mean the app itself is local but transcription still hits an external API? That distinction is the whole ballgame for anyone who'd use this with sensitive work. Also wondering how it handles domain-specific vocabulary, medical terms, code identifiers, things that generic transcription models consistently mangle.
Juno
@fberrez1 Thanks for the comment. "Local" here means we are not making any single external API call. Your end-to-end speech is previewed, transcribed and converted into correct text/action locally, which means you can run it fully offline, like you are in flight to somewhere. You can checkout our architecture here. https://cassiniresearch.com/products/juno/blog/inside-juno-local-voice-layer.html
One of the most interesting launches today! The part some people might miss is that live transcription brings its own trust problem like a word commits on screen, then self-correction rewrites it a beat later and now I'm watching my own text flicker. I wonder how long does a token sit provisional before you lock it, and does the local agreement step ever lose a race against me already talking past it?
Juno
@artstavenka1 We don’t lock by “token age”; the live lane is word-hypothesis based.
Juno runs local Whisper in a resident preview process, then applies LocalAgreement over rolling word hypotheses. A word only moves into the committed prefix after two consecutive decodes agree on the same normalized word at the same position.
The unstable suffix stays as tail evidence internally; in the production HUD we bias toward showing committed text only, so the flickery part doesn’t get promoted onto the screen as “done.”
Current launch config also has a 600ms draft horizon: even if two decodes agree, words whose timestamps land inside the last 600ms of buffered audio are demoted back to tail because that’s the truncated-window zone where Whisper can confidently invent continuations. Those words commit one decode later if more audio confirms them, or through the final/silence confirmation path when the utterance actually ends.
Refocus
The fully-local, offline architecture is the part I keep coming back to. I work on voice AI for older adults, and the thing that breaks most transcription is atypical speech: slower pacing, disfluencies, regional accents the generic models were never trained on. Since Juno runs the model on-device, can users add custom vocabulary or adapt it to a specific speaker over time, or is the acoustic model fixed? And with no cloud fallback, what happens to a low-confidence segment, does it surface uncertainty or just commit to a best guess?
Juno
@igorgurovich atypical speech is exactly where a raw model falls apart - which is the whole reason we built a harness around it, not just a model.
To your questions - yes, Juno can do the following:
Custom vocabulary - you can add the names, jargon and domain terms generic models routinely miss
Snippets - save shortcuts that expand into long text you'd otherwise retype
Live transcript - you see every word the moment you say it, so a misheard one gets fixed on the spot, not a minute later
Adapting to a speaker - the acoustic model doesn't retrain itself on each person's voice. The adaptation comes from the layers wrapped around it: your custom vocab, context pulled from what's on your screen, and a set of deterministic local checks on the output
Low-confidence words, with no cloud to fall back on - the checks catch a good chunk before they ever land; for whatever's left, the live transcript means you catch it yourself, so it's never a silent best guess.
Most tools just lean on the model. That works until you hit your exact cases - the harness around it is the part most people skip, and the part we're proudest of.
Best bit: it's open source. You can fork it and tune it for older-adult speech yourself - add the vocabulary, adjust the checks. No cloud tool lets you do that.
https://github.com/Cassini-Research/juno
https://cassiniresearch.com/products/juno/blog/
ZeroHuman.
Hey congrats on the launch @Juno team and congrats @rohanrecommends for the great hunt!
Wwhy should someone use Juno instead of tools like Wispr Flow or MacWhisper?
Is the main difference that Juno is local-first, offline, open source, and has no subscription? Or are there also specific workflow advantages in how it handles dictation, rewriting, and typing across different Mac apps?
Juno
@byalexai thank you for your notes. On your question -
There are a lot of dictation tools now. Some are good. But the whole category quietly accepted one tradeoff: you can be private, or you can be powerful, but not both.
The good tools run in the cloud, so every word you say leaves your machine.
The private ones run on your device, but they're stripped down, because the smart stuff was always too heavy to run locally.
The accepted wisdom is that you simply can't run a real speech model and a language model on-device at the speed this needs. We, decided to engineer a local speech harness.
We did the hard work to run the entire thing locally, and we refused to drop a single feature to get there.
So Juno does everything the best cloud tools do.
It turns what you said into what you meant.
It rewrites text you highlight. It makes your notes and reminders.
It reads context from what's on your screen.
It works inside every app you already use.
And it shows you the words as you speak them, which happens to be one of the oldest rules in good design. People work better when they can see what they're making as they make it.
All of it runs on your Mac. Offline.
Nothing ever leaves the machine.
Our thesis is simple. Voice is becoming the main way we talk to machines, and something that fundamental should never sit behind a monthly fee.
Local, offline, open source, and free — that's a rare combo. Does Juno work well with longer dictations, like voice memos or drafting emails, or is it more optimized for short bursts of speech?
Juno
@doganakbulut It actually excels at longer dictations! The live transcription lets you see your text as you speak, so you never lose your train of thought. Final processing just takes slightly longer for extended speeches (depending on your Mac's RAM).
I regularly use it to dictate long prompts that run well over 6-7 minutes. It’s honestly the only way I write them now.
MindReader v1
A key discussion for me was: how high can reliability be?
I give very long blind-prompts (I literally close my eyes because the dancing animation distracts me) - Juno can show me the transcript, which is great
but what if my prompt is dropped? it happens sometimes in GPT still and I was very keen to understand the speech harness - which will prevent this and other oops moments from happening.
Would love it if @jas_jaski or @dudhatparesh talk a little more about the engineering behind Juno.
Juno
@ishita8088 this is exactly what the history layer is for.
Nothing you say is ever dropped, because Juno keeps a dedicated record of it locally. Three layers, all on your machine:
the raw transcript, exactly what you said, before any cleanup
the cleaned version Juno produced from it
the original audio, kept on-device for [30 days] so you can replay it and check, yours to keep or delete
So even if a prompt glitches or you lose track of what went where, you just open the history and pull it back. You can always see exactly what you said and exactly what Juno did with it. That, more than anything, is the point of the product.