
OpenInterpretability
Open-source toolkit to audit what your LLM knows
3 followers
Open-source toolkit to audit what your LLM knows
3 followers
The first mech interp toolkit that runs inside Claude Code, Cursor, and Cline via MCP. Production probes (FabricationGuard, agent-probe-guard) catch hallucinations + agent failures. ProbeBench leaderboard, SAE training from 30-min free Colab to paper-grade. Apache-2.0.





Hey PH, Caio here, maker of OpenInterpretability.
When something breaks inside an LLM app ā hallucination, silent agent failure, "works on prompt A but not on prompt B" ā you usually have no way to see inside the model. Mech interp can answer those questions, but the tools have been research-only: H100s, deep domain knowledge, weeks of setup.
So I built the first mech interp MCP server. It plugs straight into Claude Code, Cursor, and Cline. Once installed, your AI assistant can call interpretability tools directly during a session ā capture activations, look up SAE features, run probes, test causal interventions. No separate notebook, no context switch.
ā One-line install: openinterp.org/start
Two production probes ship with it today:
FabricationGuard ā drop-in hallucination detector on Qwen3.6-27B. ā openinterp.org/products/fabricationguard
agent-probe-guard ā detects silent coding-agent failures with Qwen 3.6 27b. ~18% budget cut at 86% accuracy.
ā pip install openinterp All Apache-2.0. What I'd love feedback on: - Which IDE workflow would you want this in next? - What LLM failure mode do you wish you could actually see into?
Happy to answer anything.
Researchers trying to find causality can use OpenInterpretability MCP to connect Claude Code, Cursor or Cline to GPUs on Google Colab to do Vibe Research