OpenInterpretability

Open-source toolkit to audit what your LLM knows

3 followers

Open-source toolkit to audit what your LLM knows

3 followers

Visit website

AI Infrastructure Tools

•

AI Metrics and Evaluation

•

LLM Developer Tools

The first mech interp toolkit that runs inside Claude Code, Cursor, and Cline via MCP. Production probes (FabricationGuard, agent-probe-guard) catch hallucinations + agent failures. ProbeBench leaderboard, SAE training from 30-min free Colab to paper-grade. Apache-2.0.

Free

Launch tags:Open Source•Developer Tools•Artificial Intelligence

Launch Team / Built With

Framer AI AgentsDesign and publish professional sites with AI

Promoted

Maker

📌

Hey PH, Caio here, maker of OpenInterpretability.

When something breaks inside an LLM app — hallucination, silent agent failure, "works on prompt A but not on prompt B" — you usually have no way to see inside the model. Mech interp can answer those questions, but the tools have been research-only: H100s, deep domain knowledge, weeks of setup.

So I built the first mech interp MCP server. It plugs straight into Claude Code, Cursor, and Cline. Once installed, your AI assistant can call interpretability tools directly during a session — capture activations, look up SAE features, run probes, test causal interventions. No separate notebook, no context switch.

→ One-line install: openinterp.org/start

Two production probes ship with it today:

FabricationGuard — drop-in hallucination detector on Qwen3.6-27B. → openinterp.org/products/fabricationguard

agent-probe-guard — detects silent coding-agent failures with Qwen 3.6 27b. ~18% budget cut at 86% accuracy.

→ pip install openinterp All Apache-2.0. What I'd love feedback on: - Which IDE workflow would you want this in next? - What LLM failure mode do you wish you could actually see into?

Happy to answer anything.

Report

2mo ago

Maker

Researchers trying to find causality can use OpenInterpretability MCP to connect Claude Code, Cursor or Cline to GPUs on Google Colab to do Vibe Research

Report

2mo ago

Forum Threads

p/openinterpretability

•

2mo ago

What's the most painful LLM failure you've ever debugged?

We've all hit that moment: the model works on prompt A, breaks silently on prompt B, and there's no log line, no stack trace, no clue what changed inside the model.

I'm launching OpenInterpretability in a few hours on Product Hunt. It's a mech interp toolkit that runs inside Claude Code, Cursor, and Cline via MCP plus drop-in probes for hallucination and agent-failure detection. The project started because I needed a way to see what was happening when a coding agent kept making the same silent tool-call mistake.

View all

Hey PH, Caio here, maker of OpenInterpretability.

→ One-line install: openinterp.org/start

Two production probes ship with it today:

FabricationGuard — drop-in hallucination detector on Qwen3.6-27B. → openinterp.org/products/fabricationguard

agent-probe-guard — detects silent coding-agent failures with Qwen 3.6 27b. ~18% budget cut at 86% accuracy.

→ pip install openinterp All Apache-2.0. What I'd love feedback on: - Which IDE workflow would you want this in next? - What LLM failure mode do you wish you could actually see into?

Happy to answer anything.