Launching today

OpenInterpretability
Open-source toolkit to audit what your LLM knows
3 followers
Open-source toolkit to audit what your LLM knows
3 followers
The first mech interp toolkit that runs inside Claude Code, Cursor, and Cline via MCP. Production probes (FabricationGuard, agent-probe-guard) catch hallucinations + agent failures. ProbeBench leaderboard, SAE training from 30-min free Colab to paper-grade. Apache-2.0.





Hey PH, Caio here, maker of OpenInterpretability.
When something breaks inside an LLM app ā hallucination, silent agent failure, "works on prompt A but not on prompt B" ā you usually have no way to see inside the model. Mech interp can answer those questions, but the tools have been research-only: H100s, deep domain knowledge, weeks of setup.
So I built the first mech interp MCP server. It plugs straight into Claude Code, Cursor, and Cline. Once installed, your AI assistant can call interpretability tools directly during a session ā capture activations, look up SAE features, run probes, test causal interventions. No separate notebook, no context switch.
ā One-line install: openinterp.org/start
Two production probes ship with it today:
FabricationGuard ā drop-in hallucination detector on Qwen3.6-27B. ā openinterp.org/products/fabricationguard
agent-probe-guard ā detects silent coding-agent failures with Qwen 3.6 27b. ~18% budget cut at 86% accuracy.
ā pip install openinterp All Apache-2.0. What I'd love feedback on: - Which IDE workflow would you want this in next? - What LLM failure mode do you wish you could actually see into?
Happy to answer anything.