Building Sverklo (launching tomorrow on PH), I ran a structured dogfood protocol used the tool on its own codebase to find real bugs before users did.
Found 4 integration-level bugs that unit tests missed:
Impact analysis silently dropped repeat call sites the worst possible failure for a refactor-safety
Reference search returned 48 substring matches, drowning the 5 real
Lookup returned "No results" on valid queries instead of explaining why
Parser off-by-one skipped every function after the first in multi-function
All fixed, regression-tested, and documented in a full unedited session log