Launching today

Wallie V2
The open-source AI streamer that actually feels alive
56 followers
The open-source AI streamer that actually feels alive
56 followers
Wallie is an open-source AI streamer that actually feels alive. It reacts to your screen, reads live chat on Twitch/YouTube/Kick, animates a Live2D avatar with real lipsync, and never repeats itself β all running locally on your machine. Swap LLM and TTS providers freely. Start free with Groq + Piper. Zero cloud lock-in.









Wallie V2
The deduplication engine catching paraphrased repetition is the detail that stood out most to me β thatβs a genuinely hard problem. Most AI streamers sound like a broken record within 30 minutes. How does it handle the edge case where the same topic comes up organically in chat but from a different angle β does it detect the topic or the phrasing?
Also curious about the offline Ollama + Piper stack in practice. Whatβs the quality gap between the full cloud version and running fully local?
Wallie V2
@dani_mashaelΒ Great question β it tracks both, at different layers. The dedupe engine itself works on phrasing: bigram + trigram Jaccard similarity against a rolling buffer of recent output, with a 0.65 threshold. That catches paraphrased repetition even when the wording is different.
But above that there's a separate theme tracker that operates at the topic level β it logs the angles and framings that have already been used, so even if the phrasing is fresh, the same take on the same topic gets blocked. The two layers work independently: phrase cooldown handles "don't say those words again", theme tracker handles "don't revisit that angle again".
The edge case you're describing β same topic, different angle from chat β is handled by the intent system. Highlight chat (a message that asks a direct question or pushes a specific angle) gets barge-in priority, and the theme tracker has a separate slot for "responding to chat" vs "monologue". So if chat organically surfaces the same topic but from a new direction, Wallie engages with the new angle without it triggering the repetition block. It's not perfect β if the response ends up covering the same ground anyway, the dedupe still catches it at the output level β but the intent is there.
the reacts to your screen feature is the interesting differentiator here. most AI streamers just respond to chat which is a solved problem. an avatar that can comment on what's actually happening in the game or on screen is a different kind of presence. curious how the screen reading works, is it vision model calls on a frame interval or something else, and what the latency looks like between something happening on screen and Wallie actually reacting to it
Wallie V2
@ansari_adinΒ Vision runs on a frame interval, yeah. mss captures the screen, perceptual hash (pHash) detects meaningful changes, and if the delta clears the threshold, it fires a vision model call with the current frame. The interval and sensitivity are configurable from the dashboard.
Latency from screen event to spoken reaction: typically 2β4 seconds end to end, depending on the LLM provider. Groq + Llama-4 Scout gets you the fastest loop (~1.5β2s). Claude Sonnet is slower on raw latency but produces better reactions β especially for things like recognizing game UI, character names, or anything that requires IP/context knowledge.
The attention engine also means not every screen change triggers a full reaction. The model probabilistically assigns DEEP (22%), GLANCE (28%), TANGENT (5%), IGNORE (27%), or SILENCE (18%) β so Wallie doesn't spam reactions to every mouse move, which makes the ones that do happen feel more considered. Streak fatigue prevents the same reaction type from firing back-to-back.
Super work! is it possibly to integrate it with Gemini Live Model?
Wallie V2
@ashishkingdomΒ Gemini is already supported as an LLM provider (Gemini 2.5 Flash and Pro, streaming + vision). You set it from the dashboard: Engine β provider: gemini, then pick your model.
Gemini Live specifically (the real-time audio/multimodal API) isn't integrated yet β that's a different API surface from the standard completions endpoint Wallie uses. It's on the roadmap conceptually (the "Hearing" item β real-time audio input), but the current TTS pipeline and single-history orchestrator design would need some rethinking to accommodate it cleanly. If you're interested in contributing, that'd be a solid PR to open.