Launched this week

Spotlight by Backplanes

Launched this week

Session reports for Claude Code & Codex to improve your code

735 followers

Session reports for Claude Code & Codex to improve your code

735 followers

Visit website

Command line tools

•

AI Metrics and Evaluation

Keep up with your agents. Spotlight reads your Claude Code and Codex sessions and shows you what your agents actually did, and how to get recursively better every session: what to fix now, what to ship better next time, what's worth sharing. One harness or seven, solo or across your team. Free.

Free

Launch tags:Developer Tools•Artificial Intelligence•Security

Launch Team

Wispr Flow: Dictation That Works EverywhereStop typing. Start speaking. 4x faster.

Promoted

Well done! Was post-session reporting a deliberate call over an inline guardrail that interrupts the agent mid-write (i.e. less intrusive, keeps you in flow)?

Report

3d ago

Maker

@artstavenka1 Thanks, Art! Definitely deliberate, but one (good!) clarification: it's not limited to being post-session. Spotlight reports actually build while you work, just minutes behind your agent, so you're not waiting for a session to end to find out what happened.

What we deliberately avoided is the inline path. A guardrail that interrupts mid-write has to sit between you and the model, with the latency and potential breakage that path implies, and we very much want to help keep you in flow. :)

Report

3d ago

AISA AI Skills Test

this is smart. the gap right now with AI coding agents is that most people have no feedback loop — they ship what the agent outputs and hope for the best. having session-level visibility into what actually happened is the kind of thing that separates someone who uses AI well from someone who just uses AI. curious how granular the reports get on code quality vs just activity metrics?

Report

2d ago

Spotlight by Backplanes

Maker

@ozandag Thanks for comment Ozan. This is a great question!

We don't do code quality analysis yet, but what we do is deep analysis of specific sessions and also all your sessions in aggregate. The report covers a few major areas:

Things you really should watch out for:
- Was a credential sent out inappropriately?
- Was PII sent to a website that wasn't known?
- Did the agent access production without knowledge consent?
- etc
The second thing we do is we look for patterns that should be replicated for you and across your teams. There are certain things you do really well replicate in those and other places. Like TDD use.
The third thing we do is we also find ways to speed you up. If there are certain things that you're doing that could be optimized, both from a token perspective or a speed perspective, we highlight those as well. For instance, I had an instance that auth'ed to the same service 4 times, killing 25 minutes.
The fourth major thing we do is we actually just talk about how you use your tokens: where do they go, what MCPS/etc are being used.

In addition to that, we also give stats on what code bases you work in the most, CI/CD pass/fails, abandoned work, github activity. And we do this across all sessions for both Codex and Claude Code together.

If you work with somebody else, we team level insights as well.

Code quality is not a bad idea though! We'll add it to our backlog. Thanks so much for the great question Ozan!

Report

2d ago

The session-report angle for Claude Code/Codex is pretty useful. I might have missed it, but do you separate “what changed” from “why the agent made that choice”? That distinction matters a lot when reviewing messy agent runs.

Report

2d ago

Spotlight by Backplanes

Maker

@xiaosong001 Thanks for the question. This is a good one! We capture reasoning and subagent behavior in our analysis of the session. We know the context of response/reasoning/subagent/tool is really important to understand why something became messy or went off the rails. That's why we make sure that the entire session is processed (or at least processed as very large chunks for you hardcore people with 300h+ sessions). Thanks for your comment, and the insight!

Report

2d ago

This is a useful direction. For coding agents, the hard part is usually not generating more code, it is making the session reviewable afterward.

The report I’d want is pretty boring: changed files, risky assumptions, tests/checks run, failed attempts, and a short “what a human should look at first” section.

Report

3d ago

Maker

@kevinzrzgg, you seem to have written our report spec almost exactly. :)

Changed files: all files read and written are in there. Tests and checks: flagged with their outcomes when the session shows them. Failed attempts: called out, including the distinction between deliberate re-verification and flailing retries. "What a human should look at first": that's the top of the report, a one-line verdict with the main outcome, then findings ordered by severity and guidance on what to do for each.

The one we can only claim half credit on is risky assumptions: concrete risky choices surface as findings, and a blind-spots section names what the report couldn't verify, but a dedicated assumptions section is a great idea.

And we're with you on boring: the standing rule inside the report is no invented findings and no padded advice, an empty section beats a manufactured one.

Report

3d ago

Been waiting for a product like this to arrive. Coding agents broke the feedback loop that used to make engineers better. You don't write the code, you don't review most of it, so where does the learning come from? Everything in the stack accelerates output; nothing closes the loop back to the human. Session reports as a feedback mechanism (not just an audit trail) is the right shape for that. The "what's worth keeping" part matters more than the scary findings, IMO.

Report

3d ago

Tabstack by Mozilla

Hunter

@kcpike framing this! S/O to ?makers for building this

Report

2d ago

Maker

@kcpike "Nothing closes the loop back to the human" is the cleanest statement of the problem I've seen yet. :) That's exactly it: review used to be where engineers got better, and agents quietly took that away while speeding everything else up.

Report

2d ago

Spotlight by Backplanes

Maker

@kcpike 1000% Everything is changing under our feet at a crazy rate, so the faster we learn and improve the better we get and the faster we all go! Recursively compounding returns!

Report

2d ago

The SSH-key story is exactly the gap. Session reports show the blast radius after the fact. Are you thinking about pre-action gates too, like: this agent can read these paths, write only there, and ask before secrets or env files?

Report

19h ago

Spotlight by Backplanes

Maker

@blah_mad 1000% that's exactly the roadmap! Help you set policy so that these types of things can be prevented or steered in the right direction, and to create an automated feedback loop between Spotlight and your coding harness to self-improve in the manner you want.

Report

12h ago

@antifreeze yes, that loop is the sharp part. The tricky bit is separating learn from this session from change policy automatically. I would want a small approval gate there, otherwise the safety layer becomes another agent with permissions.

Report

11h ago

DiffSense

300$ for 50min coding. what kind of models are you running? 😅 How does it get recursivly better for each session i dont get it? reminds off entire.io

Report

3d ago

Maker

@conduit_design Ha, right?! The wild part: that's the agents' own tab, we just hand you the receipt. It's crazy how quickly token usage accelerates when you're running multiple subagents on an intensive job, and Fable pricing is going to make this even more fun for all of us soon. 😅

On "recursively better," the idea is that it's a loop with you in it. The model never changes; your setup does. Each report turns what happened into concrete and actionable advice: a fix to apply, a CLAUDE.md line to include, a Skill to draft. Your agent loads that richer setup next session and starts more informed than the last one.

Report

3d ago

DiffSense

@gogogadgetneil Yeah but thats a slippery slope. You can load all the prerequsite gotchas you want. but most of the time. a clean slate. a undiluted context window is what you need to solve a task. sometimtes its the non determinstic side of AI that solves things. making it more deterministic can have it go down the same rabit holes that lead no where. and actually burn more tokens. because it has to deal with some many constraints. if feels oerwhelemed. and does anything to pas the test. even if the result is contrived. On the bill side. I dont get how 55min of coding now cost 300 $. maybe try the asian models? there are ways. ther eis a project called backdoor. that uses asian models in claude code etc. Personally I run agents differently altogther, for free. But we can talk about that another time :D

Report

3d ago

Spotlight by Backplanes

Maker

@gogogadgetneil @conduit_design The slippery slope is real, and we're with you -- a context window stuffed with stale gotchas is its own kind of token burn. That's exactly why the reports flag context bloat and call out when something should be condensed or split rather than just added to. The point is a setup that compounds and gets leaner, not a pile that grows until it's unmanageable again 😂 Would love to see what Spotlight says about how you run things, and hear where you think it gets it wrong. That's the feedback we're searching for today!

Report

2d ago

1 2 3 4

•••

Things you really should watch out for:
- Was a credential sent out inappropriately?
- Was PII sent to a website that wasn't known?
- Did the agent access production without knowledge consent?
- etc
The second thing we do is we look for patterns that should be replicated for you and across your teams. There are certain things you do really well replicate in those and other places. Like TDD use.
The third thing we do is we also find ways to speed you up. If there are certain things that you're doing that could be optimized, both from a token perspective or a speed perspective, we highlight those as well. For instance, I had an instance that auth'ed to the same service 4 times, killing 25 minutes.
The fourth major thing we do is we actually just talk about how you use your tokens: where do they go, what MCPS/etc are being used.