Gabber - Build Realtime AI Apps That Can See, Hear, Speak, And Do
by•
Gabber is a realtime inference and orchestration engine for building multi-modal AI apps that see, hear, speak, and act — orchestrating models, media streams (mics, screens, cameras), and tools/MCPs across multiple participants and inputs.


Replies
Hey Product Hunt!
We’re Jack, Brian, and Neil, the team behind Gabber, a realtime, multimodal AI engine to build apps that see, hear, speak, and do things.
Today we’re launching Gabber Cloud, our batteries-included hosted orchestration and inference platform that makes prototyping and productionizing realtime AI apps, agents, and personas fast and easy.
Open-source repo
Launch video
Resources To Get Started
Getting Started Demo
AI Personal Trainer/Rep Counter Tutorial
Simple Vision + Voice + Demo
What Gabber Does
Gabber is the infrastructure layer for realtime, multimodal AI. Build AI that doesn’t just respond, but sees and acts.
With Gabber, developers can build apps that process audio, video, and text streams together, make decisions in realtime, and call tools/MCPs to act on what they perceive.
It’s like a nervous system for realtime AI apps — orchestration and inference all in one graph-based engine.
Why It’s Different
Until now, building real-time AI meant juggling a dozen APIs: STT, TTS, LLM, WebRTC, signaling, state management… and then fighting latency.
Gabber makes all of that feel like building a web app:
- 🔀 Multimodal orchestration for audio, video, and text
- 🧩 Simple SDKs (JS, React, Python) for composable agent flows
- 🗣️ Low-latency inference with co-located VLM, TTS, STT models
- 🔌 Bring your own models: works with any existing stack
- ☁️ Run locally or in Gabber Cloud to scale seamlessly from prototype to production
Use Cases
Developers are already building on Gabber to create:
- Computer-use agents that see your screen and act
- Smart home assistants that perceive and respond in real time
- Interactive NPCs and virtual companions that interact with
- Research and screen assistants with memory and vision
- Security and monitoring tools that understand context, not just motion
- Fitness coaching to count reps, monitor form, log workouts
What’s Next
We’re actively working on:
- New SDKs for Unity and ESP32
- More templates for common real-time app types
- Expanded model support (LLMs, TTS/STT, VLMs)
- UX and developer tooling improvements for Cloud
Everything is code-available under the same license model as n8n, so you can self-host to build internal tools freely.
Why We’re Excited
We started Gabber after realizing every AI project, from assistants to avatars, hits the same wall: there’s no simple way to orchestrate AI applications in realtime.
Gabber removes that wall.
It gives developers the foundation to build apps that feel alive, and don’t just talk, but listen, see, and actually do things, instantly.
We’d love to hear what you think:
What kind of real-time app would you build with Gabber?
Try it Out
🔗 Get Started
🖇️ Github
🗣️ Discord
Thanks for checking us out. Drop any questions below!