Luc B. Perussault Diallo

AI coding tools don't understand your codebase. 40+ projects are trying to change that.

AI coding tools are brilliant at generating code. They're blind when it comes to understanding your codebase.

They can't tell you who calls a function. They don't know what breaks if you change something. They have no idea how your team actually writes code. Every session, they start from scratch: grep, read files, guess, burn tokens.

A whole category is emerging to fix this. I've been tracking it for a few weeks and found 40+ tools that appeared in the last 6 months alone. Three approaches:

  • Context packing (flatten repo into text, paste it in. Repomix, GitIngest)

  • Embeddings (semantic search, what Cursor and Copilot do under the hood)

  • Structural graphs (parse symbols and relationships, persist, query via MCP)

Why the explosion? Three things converged: MCP gave AI tools a universal way to query external services. Tree-sitter made multi-language parsing one dependency. Quantized embeddings made local semantic search possible without API keys.

The token economics alone are wild. I saw a dead code search burn 56 tool calls and nearly a million tokens. Same search with a structural graph: 3 calls, 60 seconds. Same model. Same question.

A few observations from mapping the space:

  • 🧱 Deployment friction kills adoption. Tools that ship as a single binary get tried. Tools that need Python + Docker + Ollama get bookmarked and forgotten.

  • 🎛️ The scope question is unresolved. Some tools expose 100+ MCP capabilities. Others expose 4. Nobody knows the sweet spot yet.

  • ⚠️ IDE absorption risk. JetBrains ships a built-in MCP server starting with 2025.2. The IDE absorption risk is real for every standalone tool in this space.

Full disclosure: I'm building here too (Sense - structural graph + semantic search + blast radius + convention detection as a single Go binary via MCP. Pre-alpha, open source: https://luuuc.github.io/sense).

Wrote up the full landscape article with evaluation framework if you want the deep dive: https://medium.com/@lucdiallo/codebase-intelligence-in-the-age-of-ai-a-map-of-the-space-5fa7d349887d

What are you using to give your AI tools better codebase understanding? Anything working well that I missed?

6 views

Add a comment

Replies

Be the first to comment