Launching today

Ghost Pepper 🌶️
100% local private AI for text-to-speech & meeting notes
137 followers
100% local private AI for text-to-speech & meeting notes
137 followers
100% private on-device voice models for speech-to-text and meeting transcription on macOS. No cloud APIs, no data leaves your machine without your explicit permission.





Ghost Pepper 🌶️
I built Ghost Pepper to be 100% private and run on local Huggingface models. I open-sourced it to get help from the community, little did I know Jesse Vincent, creator of Claude Superpowers would end up contributing more code than I (read: my Claude) did. I called it Ghost Pepper because all models run locally, no private data leaves your computer. And it's spicy to offer it open source.
Product Hunt
Ghost Pepper 🌶️
@curiouskitty Cleanup uses Qwen 3.5 LLM by default (you can pick other models in settings). You can edit the prompt but it's designed to remove filler words (um, uh, like), etc. On latency: the default 2B model takes ~1-2 seconds. The 0.8B is ~0.5s if you want faster. The 4B is ~2-4s for higher quality. The aggressive vs. faithful balance: The prompt is explicitly conservative — it tells the LLM "Do NOT delete sentences. Do NOT remove context. Do NOT summarize. If you are unsure whether to keep or delete something, KEEP IT." It only removes fillers and handles explicit corrections ("scratch that", "never mind"). The hardest part was actually getting it not to follow instructions embedded in your speech (if you dictate "What's the weather?", it passes that through verbatim as text, it doesn't try to answer the question). We have 17 eval cases specifically testing that the model doesn't break character and act like a chatbot.
There's also optional ability to include OCR as context to help with corrections: if you enable it, the cleanup model sees OCR text from your frontmost window. So if you're in Slack talking about "the JIRA ticket for Kubernetes", it can correct "Cooper Netties" → "Kubernetes" by cross-referencing what's on screen. It only uses this for disambiguation, never for rewriting.
This is the category I've been waiting for someone to take seriously. Every meeting-notes tool I've tried sends audio or transcripts to a cloud I don't control, and for anything under NDA that's a hard no. "100% local" being the headline (not a buried feature) tells me you understand the actual buyer. Question for the maker: what's the model running under the hood for the TTS side, and does it hold up on older Macs or is this an M-series-and-up product? Upvoted. Rooting for the local-first AI wave.
Ghost Pepper 🌶️
@adi46 Thanks! Speech-to-text uses WhisperKit (OpenAI's Whisper models optimized for Apple Silicon by Argmax). Default is Whisper small.en (~466MB) which gives the best accuracy/speed tradeoff. We also support Parakeet v3 for 25 languages and Qwen3-ASR for 50+. Apple Silicon (M1+) only (for now at least) — WhisperKit uses Core ML and the Neural Engine which aren't available on Intel Macs. On an M1 you get real-time transcription, M2/M3/M4 is even faster. The cleanup LLM (Qwen 3.5) also needs the Neural Engine for reasonable speed.
If you use Linux, Jesse Vincent has a great fork called Pepper-X for Linux
Ran into this building something with voice input. Had to drop cloud STT because of data policies at a couple companies I was demoing to. Local first completely changes that equation. Curious how your models handle technical vocab like camel case and library names? That's been one of the hardest parts for us.
Ghost Pepper 🌶️
@webappski This is one of the hardest problems in speech-to-text. We attack it from a few angles:
1) OCR context see comment above about the optional OCR context which can incorporate spellings from words on your front-most window.
2) Word corrections: You can add preferred transcriptions in Settings. If Whisper always hears "React Query" as "react quarry", add it once and it's fixed deterministically before the LLM even runs.
3) The cleanup LLM: The local Qwen model handles camelCase formatting, but it's hit or miss on novel library names it hasn't seen in training data. The OCR context is what really saves it — if the name is anywhere on your screen, it'll get it right.
this is super refreshing
everything going cloud-first, while privacy is becoming a bigger concern
fully local voice + transcription is a strong angle
how’s the performance compared to cloud models right now?
Ghost Pepper 🌶️
@jaka_kotnik Thanks! I haven't done a lot of benchmarking myself yet but getting a lot of anecdotal feedback that it's actually faster than products that use cloud models.
@matthartman interesting! Definitely checking it out
The local-first approach resonates deeply. I built NexClip AI with the same philosophy — video stays on your Mac, only audio is sent for AI analysis when needed.
The OCR context for disambiguation is clever. We solved a similar challenge with audio RMS data — using silence detection and sentence boundaries to create precise segment cuts instead of relying purely on transcript text.
Curious: with the 2B Qwen model running locally, how much memory overhead are you seeing during a typical 60-min meeting transcription?
Product Hunt
I've always been a bit paranoid using cloud-based apps that collect super sensitive data. I expect more open-source, on-device apps like this will rise in popularity for that reason and the ability to modify to fit inside one's infra and workflows.