Launching today

Spotlight by Backplanes
Session reports for Claude Code & Codex to improve your code
579 followers
Session reports for Claude Code & Codex to improve your code
579 followers
Keep up with your agents. Spotlight reads your Claude Code and Codex sessions and shows you what your agents actually did, and how to get recursively better every session: what to fix now, what to ship better next time, what's worth sharing. One harness or seven, solo or across your team. Free.





Hey Product Hunt. We're Seth, Neil, and Nick, and we've spent a decade in security and dev tools across Google/Gmail, Valimail, Twilio, and Algolia.
We built Spotlight by Backplanes to help you keep up with your agents. It reads your Claude Code and Codex sessions and shows you what your agents actually did: session reports that make you a better engineer, every day.
This started with a scare. Neil asked Claude to fix one file. It read 47, including his ~/.ssh keys, and wrote an API key into a tracked .env. We build security software, and our own agents did this. We missed it, and caught it by accident while investigating something else.
So we looked deeper, stitching our Claude and Codex sessions together across machines. Two things floored us: how much we'd missed, and how many good moves we were making in one place but not another. Surfaced and shared, those patterns made us better, every day.
That's what Spotlight does for you. After every session, you get a report: what to fix now, what to ship better next time, what's worth sharing. One harness or seven, solo or across your team.
We're building toward a world where you can see and manage everything your agents do. Visibility is where it starts, and we think everyone deserves to know what their agents are doing, so we're making this piece free. We'll be offering paid features and automations in the near future; seeing what your agents did won't cost you. Private and secure by design, with details at backplanes.com/trust.
Install is one line, and your first report lands in ~2 minutes: Get started on backplanes.com.
Click here to join our Slack and say hello.
We hope you love Spotlight, and we can't wait to hear what it illuminates for you.
Product Hunt Shop
@antifreeze congrats on the launch! Such a needed product right now.
@antifreeze The SSH keys is exactly why this matters, gents move fast and
the blast radius is invisible until it isn't. Congrats on launch. Trying it today
@ricky_farmer : "Invisible until it isn't" is pretty much the whole problem in five words, and is better than much of the copy we wrote for this launch. :)
Glad you're giving it a run today. Tell us what your first report turns up; real sessions on day one are exactly the feedback we want.
Tabstack
@antifreeze S/O for this launch! Keep up the great work 👏👏
Product Hunt
@curiouskitty This one's fun to answer, because the trick is that there's no trick: the harnesses already write everything down. Claude Code and Codex keep a transcript of every session. Spotlight's CLI watches those transcripts on your behalf, redacts sensitive info from new activity locally, and then sends the redacted version up for pattern analysis and report generation.
So attribution falls out of the session itself: it knows its user, its project, and its tools; org rollups are aggregation, not integration. Spend is computed from the same token counts the provider meters, which is why it tracks invoices closely. It's an estimate by construction, but a well-grounded one, and no OAuth or provider-side hooks are involved.
On tradeoffs: we deliberately chose to start with reading over intercepting. Real-time gating means sitting in the request path, and a proxy adds latency to every call and breaks when harnesses update. Our way, you're minutes behind live but never blocked mid-flight: a genuine trade, and one that buys zero added latency and zero workflow friction.
We'd love for you to take a look. The install is one line, and your first report lands in a few minutes. Let us know what you find! :)
Raycast
As of this year, I’ve become a “non-engineer engineer”, letting Claude and all his robot friends abuse my terminal. For example, the other day I just discovered how to actually use a `.env` file. Yes, I am embarrassed.
How was I forced to discover this? @antifreeze (who I've known for what? 20 years??) gave me a demo of Backplanes and in a reports on one of my coding sessions I saw red:
And this wasn't the only red I saw...!
Backplanes showed me the ugly underbelly of my agent sessions: leaked credentials, missing tests, sloppy patterns I’d normalized because the app I was building "worked".
By shipping like a maniac, I leaked my secrets all over the place — and Backplanes provided me actionable steps to get my shit locked down.
This isn't just “agent analytics.” It’s a backstop for the bullshit your coding agents quietly create while you’re moving at AGI speed. Like being shown what bacteria lives on your toothbrush when you stop to under a microscope. 🦠🤮
So if you haven't been practicing excellent agentic hygiene, give Backplanes a try.
Because behind every successful coding session is a backplane.
@chrismessina Twenty years and I pay it off by showing you the bacteria on your toothbrush. 😅 Don't be embarrassed about the red -- we build security software and our own reports lit up too. It would be embarrassing if it weren't happening to everyone.
"Non-engineer engineers" are exactly why we made seeing this free: everybody's shipping like maniacs now, and everyone deserves to know what their agents are doing. (And "behind every successful coding session is a backplane" is going on a shirt.)
The session report angle makes sense. What I'm curious about is how much signal you're actually extracting versus just replaying what happened. Claude Code sessions can get noisy fast, lots of back-and-forth, abandoned branches, retried prompts, and the raw transcript isn't that useful without some layer of interpretation on top. Does Spotlight surface things like where the agent got stuck, or which tool calls failed silently, or is the report mostly a structured summary of the final output? Also curious what the security topic covers here, whether you're flagging things like secrets exposed in prompts or risky code patterns the agent introduced, since that would be a genuinely different use case than the reporting side.
You're spot on @fberrez1 : the session replay is the easy part; the interpretation layer is the product.
Short version: the raw session info is the input, not the report. The bookkeeping (counts, files touched, domains, cost) is computed mechanically, and the analysis on top is held to one rule: every finding has to point at the specific moment in the session it came from. If it can't cite the event, it doesn't ship.
On noise: that's most of what the engineering portion of the report is for. It surfaces retry storms, redundant tool loops, repeated lookups that should have been cached, and it distinguishes failing retries from deliberate re-verification. Those land as "Faster Next Time" items with payoff grounded in the session, like "~60 calls collapsed into one." CI, test, and lint outcomes get flagged when the transcript shows them. And when something isn't observable, the report says so in a blind-spots section instead of guessing. We'd rather show you an empty field than an invented one.
Security is a separate findings stream, severity-ordered, with categories like credential, shell, file, network, production, and subagent. Concretely: a live-looking key written into a tracked .env (with a paste-into-Claude prompt to rotate it), a destructive command against prod with no dry run, a call to a domain you've never used, a subagent reaching outside the project. One detail worth knowing: secrets are redacted on your machine before anything uploads and a second pass is run on the server before we write, so the report can flag the secret class without ever holding the value.
You're right that those are two different use cases. The report carries both on purpose: the security stream and the engineering narrative come from one pass over the same session and give you the full picture. That's the bet we're making.
Run it on your messiest session and tell us what you find, here on our community Slack. :)
Tabstack
@fberrez1 @gogogadgetneil FWIW you can see a sample report here: https://www.backplanes.com/features/session-reports
@fberrez1 Neil's got the depth covered, so just one top line note: the part that surprised us most is how high-signal the reports turned out to be. Even on our own sessions -- and we live in this thing -- they cut through all the back-and-forth and retries and showed us things that really mattered, starting with credentials we'd leaked and never noticed. We were expecting most reports to have very little useful signal, and that you'd have to wait to see things in aggregate stuff for the really meaningful items to bubble up. Nope! The important stuff is right there from the get go. Would love to hear what you experience and if it matches ours. Thank you!
The OS-level instrumentation approach is smart. It captures what agents actually do rather than what they report back. We've run into exactly this problem: an agent silently inlining an API key when it couldn't find the env var, and that key landing in git history. How do you distinguish intentional credential usage in test fixtures from actual leakage?
@anand_thakkar1, that war story could be one of ours: an agent improvising around a missing env var by quietly inlining the key is the kind of move nobody catches in review. One small correction, it's not OS-level instrumentation. Spotlight reads only the session transcripts the harnesses themselves write, nothing else on your machine. But your key insight holds: it's what the tools actually did, not what the agent says it did.
On fixtures vs leakage, we treat those as two different jobs. Redaction is deliberately paranoid: anything secret-shaped gets masked on your machine before upload, fixtures included, because that step shouldn't be guessing intent. The judgment lives in the analysis: where the credential landed, whether it looks live, and what the session was doing at the time, with severity reflecting that context. A dummy key in a test fixture and a live-looking key written into a tracked file are very different findings. Every finding carries its evidence, so on close calls you're the judge, with the receipts in front of you.
We'd rather flag a fixture at low severity than miss a live key. That asymmetry is on purpose.
Tabstack
Thanks for the support! Let's spread the word on LinkedIn, repost this
The interpretation layer on top of raw transcripts is the real product here. Distinguishing a retry storm from deliberate re-verification, or flagging a credential class without holding the value, it's genuine signal extraction. We've wrestled with agent filesystem boundary decisions. How do you handle cross-session pattern detection when the same agent operates across different repos or machines?
@retain_dev Gaurav, "filesystem boundary decisions" tells me we've fought some of the same battles. :)
Short answer: the anchor is identity, not inference. The CLI is signed in as you, so every session carries the same account identity no matter which repo or machine it ran on, with the repo, harness, and model riding along as context. Patterns aggregate across the account, so the same retry habit surfaces whether it happened in your API repo on a desktop or a scratch project on a laptop. That's literally how Spotlight started for us: stitching our own sessions together across machines, and being floored by how much we'd missed and how few of our good patterns traveled.
One thing we deliberately don't do is behavioral fingerprinting to guess "same agent" across accounts: identity stays explicit and predictable. And since you mentioned boundaries: every report flags file access outside the project, per session, so drift is visible long before it needs to be policy.
I can definitely see that value here. It reminds me that I constantly have this nagging fear as I'm building with agents, with that inside voice constantly thinking "what are you really doing under the hood". The great thing about humans doing the development is we're slow! That acts as a natural fishing net to catch inadvertent security disasters. But as we transition to agentic engineering, speed will overwhelm us unless we have tools like this.
As the other conversations have mentioned here... the real secret sauce here is how you "discover" these issues. Its going to be tricky across different models and platforms, as each will probably have their own screw-up signatures that will need tuning.
@jay_steele "we're slow, and that acts as a natural fishing net" might be the best articulation of "why now?" I've read all day. Review capacity was never really designed; it was a byproduct of human pace. Agents removed the pace, and the net went with it. :)
Your secret-sauce instinct is right too: every model and harness has its own screw-up signatures, and they drift with every release. Two things keep that tractable. The "what happened" layer is built per harness, so each platform gets read faithfully on its own terms. And the "what it means" layer judges behavior in context rather than pattern-matching known failure modes, so a new model's novel mistakes still surface as "this session did something worth your attention" before anyone's named the signature. When something genuinely can't be read, the report says so rather than guessing. The tuning never ends, you're right about that.
The design just keeps it shallow.
@jay_steele thanks, Jay, appreciate this comment! To the secret sauce being tricky across platforms-- we were worried about that, too. We've been surprised how well what we're doing works, and can not only discover things in a single platform, but across multiple ones. Would love to hear what you're seeing, and what we could improve for you!