Session reports for Claude Code & Codex to improve your code

Start new thread

Spotlight by Backplanes - Session reports for Claude Code & Codex to improve your code

Kilo Code

•2mo ago

Keep up with your agents. Spotlight reads your Claude Code and Codex sessions and shows you what your agents actually did, and how to get recursively better every session: what to fix now, what to ship better next time, what's worth sharing. One harness or seven, solo or across your team. Free.

Replies

Best

Well done! Was post-session reporting a deliberate call over an inline guardrail that interrupts the agent mid-write (i.e. less intrusive, keeps you in flow)?

Report

2mo ago

Maker

@artstavenka1 Thanks, Art! Definitely deliberate, but one (good!) clarification: it's not limited to being post-session. Spotlight reports actually build while you work, just minutes behind your agent, so you're not waiting for a session to end to find out what happened.

What we deliberately avoided is the inline path. A guardrail that interrupts mid-write has to sit between you and the model, with the latency and potential breakage that path implies, and we very much want to help keep you in flow. :)

Report

2mo ago

The scary part of vibe coding fast isn't the bug you catch, it's the secret you committed three sessions ago and never noticed. I spent years in risk and security before I ever touched Claude Code, so "what my agent actually did" is exactly the report I always wished I had. Does Spotlight call out the security stuff specifically, leaked keys, missing checks, or is it more about code quality and patterns?

Report

2mo ago

Maker

@luca_capone, "the secret you committed three sessions ago" is almost word-for-word how Spotlight actually started for us: I asked Claude to fix one file, and an API key ended up in a tracked .env that we only caught by accident. So yes, security is very much called out specifically, and it leads the report rather than riding along. Findings arrive severity-ordered in their own stream: secrets landing in files git tracks, prod-touching commands that skipped a dry run, an agent quietly reaching a service you've never used, each with the evidence behind it and a concrete fix.

Code quality and patterns that can help make you more effective with your harness are in there too, so the report always gives you value, even when there are no security-related findings. And since those transcripts are already sitting on your machine, the first report can start with the sessions you've already run. Your "three sessions ago" is still catchable. :)

Report

2mo ago

Kilo Code

Hunter

FWIW you can see a sample report here: https://www.backplanes.com/features/session-reports

Report

2mo ago

Spotlight by Backplanes

Maker

@luca_capone exactly! Would love for you to give Spotlight a spin and tell us what lands for you, and what we could improve!

Report

2mo ago

The OS-level instrumentation approach is smart. It captures what agents actually do rather than what they report back. We've run into exactly this problem: an agent silently inlining an API key when it couldn't find the env var, and that key landing in git history. How do you distinguish intentional credential usage in test fixtures from actual leakage?

Report

2mo ago

Maker

@anand_thakkar1, that war story could be one of ours: an agent improvising around a missing env var by quietly inlining the key is the kind of move nobody catches in review. One small correction, it's not OS-level instrumentation. Spotlight reads only the session transcripts the harnesses themselves write, nothing else on your machine. But your key insight holds: it's what the tools actually did, not what the agent says it did.

On fixtures vs leakage, we treat those as two different jobs. Redaction is deliberately paranoid: anything secret-shaped gets masked on your machine before upload, fixtures included, because that step shouldn't be guessing intent. The judgment lives in the analysis: where the credential landed, whether it looks live, and what the session was doing at the time, with severity reflecting that context. A dummy key in a test fixture and a live-looking key written into a tracked file are very different findings. Every finding carries its evidence, so on close calls you're the judge, with the receipts in front of you.

We'd rather flag a fixture at low severity than miss a live key. That asymmetry is on purpose.

Report

2mo ago

Kilo Code

Hunter

Thanks for the support! Let's spread the word on LinkedIn, repost this

Report

2mo ago

Been waiting for a product like this to arrive. Coding agents broke the feedback loop that used to make engineers better. You don't write the code, you don't review most of it, so where does the learning come from? Everything in the stack accelerates output; nothing closes the loop back to the human. Session reports as a feedback mechanism (not just an audit trail) is the right shape for that. The "what's worth keeping" part matters more than the scary findings, IMO.

Report

2mo ago

Kilo Code

Hunter

@kcpike framing this! S/O to ?makers for building this

Report

2mo ago

Maker

@kcpike "Nothing closes the loop back to the human" is the cleanest statement of the problem I've seen yet. :) That's exactly it: review used to be where engineers got better, and agents quietly took that away while speeding everything else up.

Report

2mo ago

Spotlight by Backplanes

Maker

@kcpike 1000% Everything is changing under our feet at a crazy rate, so the faster we learn and improve the better we get and the faster we all go! Recursively compounding returns!

Report

2mo ago

This is rad. kind of terrifying to see how much some of my sessions cost!

Report

2mo ago

Maker

@yrechtman "Rad and kind of terrifying" is the almost universal reaction so far. ;-)

It's the bar tab after a great night: the fun was real, and now there's an itemized record of exactly how.

The difference is this tab works for you. It shows what drove the cost, and the biggest line items usually turn out to be the easiest fixes: the same dead end paid for ten times, an agent left grinding away on the wrong thing.

Curious to hear if your next session's number behaves.

Report

2mo ago

This hits a real blind spot with coding agents. They can move fast, but knowing what they quietly touched, broke, or exposed afterward feels just as important as the code they shipped.

Report

2mo ago

Maker

Thanks, @farrukh_butt1 "quietly" is the certainly operative word: the touching, breaking, and exposing all happen mid-flow, in the part nobody's watching, while the shipped code gets all the attention. :)

Report

2mo ago

Forage Mail

Spotlight replaced a bespoke string of skills I would have to run by hand. Super helpful!

Report

2mo ago

Spotlight by Backplanes

Maker

@richiebonilla agreed! we think there's a real opportunity here for Spotlight to help build better skills for engineers on the fly based on the content of these reports. We're excited to make engineers who use coding harnesses faster, safer and more cost effective!

Report

2mo ago

Maker

@richiebonilla, "a bespoke string of skills run by hand" is exactly the ritual we kept finding, and living ourselves. Replacing yours is about the highest compliment a tool like this can get, so thank you.

Report

2mo ago

Upstream

looks really cool! Gonna take it for a spin

Report

2mo ago

Kilo Code

Hunter

@louislecat lfg! here you go: backplanes.com

looking forward to your thoughts

Report

2mo ago

Spotlight by Backplanes

Maker

@louislecat amazing! excited to hear what you think and what you'd like us to build next!

Report

2mo ago

This is a useful direction. For coding agents, the hard part is usually not generating more code, it is making the session reviewable afterward.

The report I’d want is pretty boring: changed files, risky assumptions, tests/checks run, failed attempts, and a short “what a human should look at first” section.

Report

2mo ago

Maker

@kevinzrzgg, you seem to have written our report spec almost exactly. :)

Changed files: all files read and written are in there. Tests and checks: flagged with their outcomes when the session shows them. Failed attempts: called out, including the distinction between deliberate re-verification and flailing retries. "What a human should look at first": that's the top of the report, a one-line verdict with the main outcome, then findings ordered by severity and guidance on what to do for each.

The one we can only claim half credit on is risky assumptions: concrete risky choices surface as findings, and a blind-spots section names what the report couldn't verify, but a dedicated assumptions section is a great idea.

And we're with you on boring: the standing rule inside the report is no invented findings and no padded advice, an empty section beats a manufactured one.

Report

2mo ago

DiffSense

300$ for 50min coding. what kind of models are you running? 😅 How does it get recursivly better for each session i dont get it? reminds off entire.io

Report

2mo ago

Maker

@conduit_design Ha, right?! The wild part: that's the agents' own tab, we just hand you the receipt. It's crazy how quickly token usage accelerates when you're running multiple subagents on an intensive job, and Fable pricing is going to make this even more fun for all of us soon. 😅

On "recursively better," the idea is that it's a loop with you in it. The model never changes; your setup does. Each report turns what happened into concrete and actionable advice: a fix to apply, a CLAUDE.md line to include, a Skill to draft. Your agent loads that richer setup next session and starts more informed than the last one.

Report

2mo ago

DiffSense

@gogogadgetneil Yeah but thats a slippery slope. You can load all the prerequsite gotchas you want. but most of the time. a clean slate. a undiluted context window is what you need to solve a task. sometimtes its the non determinstic side of AI that solves things. making it more deterministic can have it go down the same rabit holes that lead no where. and actually burn more tokens. because it has to deal with some many constraints. if feels oerwhelemed. and does anything to pas the test. even if the result is contrived. On the bill side. I dont get how 55min of coding now cost 300 $. maybe try the asian models? there are ways. ther eis a project called backdoor. that uses asian models in claude code etc. Personally I run agents differently altogther, for free. But we can talk about that another time :D

Report

2mo ago

Spotlight by Backplanes

Maker

@gogogadgetneil @conduit_design The slippery slope is real, and we're with you -- a context window stuffed with stale gotchas is its own kind of token burn. That's exactly why the reports flag context bloat and call out when something should be condensed or split rather than just added to. The point is a setup that compounds and gets leaner, not a pile that grows until it's unmanageable again 😂 Would love to see what Spotlight says about how you run things, and hear where you think it gets it wrong. That's the feedback we're searching for today!

Report

2mo ago

1 2 3 4