Hi everyone, I'm Mohith, the maker. 👋

I build AI agents, and what kept bothering me is how blind they run. You wire an agent to an LLM and some tools, and you have no real view into what's flowing through it. Is a prompt injection coming in through a tool response? Is a jailbreak getting through? What is the agent actually calling, and with what? I had no visibility into any of it.

I tried existing options, but most wanted an SDK, code changes, or routing my traffic through someone else's cloud.

I just wanted to see what was happening without rewriting anything. So I built Calus.

What it does:

🔌 Drop in with one env variable. No code changes, no SDK. Works with anything OpenAI-compatible.

🔎 Scans every call for prompt injection, jailbreaks, and agent abuse, and shows the verdict in a live dashboard and in response headers.

🧩 Traces every agent and the tools it actually calls, so you can see what your agents are doing.

⚡ Layered engine, not an LLM call. Layer 1 is pattern matching for known attack signatures. Layer 2 is a capability flow graph that catches abuse by consequence, even when no text pattern matches. Both layers can actively block in opt-in gateway mode. Layer 3, an open weight classifier model, is coming soon. Fast, no GPU, easy to audit.

🛡️ Flags by default, never touches your traffic out of the box. Blocking is opt-in, so you stay in control of when enforcement turns on.

It's open source under MIT, and the README publishes honest benchmarks, including where the engine only catches partially.

I'd rather you see the gaps up front.

Website: https://usecalus.com
Repo: https://github.com/wholesphereai/calus

This is my first real release. I'd genuinely value blunt feedback, especially on the detection approach and the false-positive rate. Happy to answer anything here.