Launching today

Bench for Claude Code
Store, review, and share your Claude Code sessions
498 followers
Store, review, and share your Claude Code sessions
498 followers
Claude Code just opened a PR. But do you really know what it did? By using Bench you can automatically store every session and easily find out what happened. Spot issues at a glance, dig into every tool call and file change, and share the full context with others through a single link: no further context needed. When things go right, embed the history in your PRs. When things go wrong, send the link to a colleague to ask for help. Free, no limits. One prompt to set up on Mac and Linux.












Bench for Claude Code
Hey Product Hunt! 👋
I’m Manuel, co-founder of Silverstream AI. Since 2018, I’ve been working on AI agents across Google, Meta, and Mila. Now I’m building Bench for Claude Code with a small team.
If you use Claude Code a lot and want to store, review, or share its sessions, this tool is for you. Once connected, Bench automatically records and organizes your sessions, letting you inspect and debug them on your own or share them with your team to improve your workflows.
Getting started is simple:
• Go to bench.silverstream.ai and set it up in under a minute on Mac or Linux
• Keep using Claude Code as usual
• Open Bench when you need to understand or share a session
That’s it.
Bench is completely free. We built it for ourselves and now want as many developers as possible to try it and shape it with us.
We’ll be here all day reading and replying to feedback (without using Claude 😂). Would love to hear what you think!
Btw, support for more agents is coming soon, so stay tuned!
@manuel_del_verme Many congratulations on the launch, Manuel and team! :)
This brings much-needed visibility into Claude Code sessions, especially for debugging + collaboration for async teams. Do you also plan to add deeper analytics or insights (like patterns across sessions or common failure points) to help developers improve workflows over time?
Bench for Claude Code
@manuel_del_verme @rohanrecommends Hey! A warm thank you from the team side! :)
We do already provide a few basic but effective ways to speed up session analysis, precisely through key insights and recaps, and our next goal is just to iterate on this more and more, to provide easier and faster ways to analyse your sessions. So, we are absolutely going to push further in that direction.
The idea of spotting cross-session patterns however is something we haven't considered yet, and it's really intriguing! We'll totally consider it as well. After all, that's the goal of this launch: to have as many people as possible to test Bench out, and identify how we could improve it! :)
Thank you!
Bababot
I’m curious how detailed the tracking is. If I can really see every tool call and file change clearly, I can imagine using this for debugging more than anything else.
Bench for Claude Code
@aarav_pittman That's as detailed as Claude Code allows to get, which is quite a lot :) Of course we get everything about tool call and file changes, but also about subagent runs and all the steps that sometimes are even hidden from Claude's terminal output. And yes, debugging has been our first reason to build bench: as a development tool allowing us to finetune automated task prompts and make them more reliable.
Once the tool was there, we then realized that it also had lots of other uses: being able to also store the whole conversation that led to develop a feature in a certain way, and then being able to share it with colleagues was also very useful, so we had to pick which aspect to focus the most on, for this launch, but yeah, debugging is definetely another great way to use Bench for! :D
I’ve been using Claude Code quite a bit, and I often lose track of what actually happened in a session. This idea of being able to go back and inspect everything feels really useful for me.
Bench for Claude Code
@amard_sonal that's precisely how I am mostly using this product nowadays! It's always pretty insightful to have a second look at all the commands being launched by Claude Code... you would never imagine how often this guy tries to replace my local supabase setup with its own non-working docker containers! :S Through bench, I can understand how it did it and how to remediate, at the very least :)
How deep does it go when tracking tool calls and file changes across a session?
Bench for Claude Code
@hamza_afzal_butt as deep as possible :) The whole goal of Bench is to trace as many details as possible on every action performed by the agent, and then to allow you to review spot the details were looking for easily and quickly! The limit is just on what Claude Code allows us to extract, which is quite a lot anyways! In terms of tool calls, we can extract all the details about the command used to launch the tool, and the "origin" of that call, whether it's the conversation that led the agent there or a subagent run that had a specific goal to reach.
About file changes, it's basically the same thing: we obviously can show the delta, but also why and when the agent took the decision to apply that specific change.
Now add observability + failure handling, otherwise it’s just scheduled guessing.
Bench for Claude Code
@ion_simion_bajinaru That's exactly what we are here for :) Providing observability for your sessions, both scheduled and in real time!
Premarket Bell
How granular is the session tracking? Can you trace decisions step-by-step or it is more of a high level overview?
Bench for Claude Code
@daniel_henry4 the goal of the tool is to allow you to get each specific detail about the whole process: you can follow all actions, subagent calls, and decisions taken during a session, so we try to store data in the most detailed possible way.
Then, of course, this gets quite quickly a lot to manage, especially on longer sessions: imagine having a 200-steps session to troubleshoot, or more, for example! For this reason we are providing a set of tools to also allow you to skim through the steps and highlight the ones you may really care about. Some tools are incredibly simple, such as just grouping steps by type of action, while some other tools are more refined, such as sending warnings on commands that may be potentially concerning. This is the area where we'll focus the most in the future as well, trying to provide as many details as possible, while allowing session analysis to be as quick as possible!
Bench for Claude Code
Claude Code is so capable that we end up trusting it a little too much. But that's exactly when things get interesting:
I've had it silently migrate my local DB to an incompatible version while fixing a bug.
Another time, Claude decided the only way it had to fix an issue with a particulary inefficient for loop, was to turn off my audio drivers.
The real problem isn't that it made mistakes. It's that I had no way to go back and understand what it did, when, and why, to learn from it and finetune my prompts. Sure, I could just scroll the claude logs, but what if the "failures" weren't apparent until much later? Or what if the issue was at step 315 out of an hour-long agent run of 500 steps?
That's why Bench is a big deal. Not just a logger, but an audit trail that makes agent actions legible: every tool call, file change, conversation, subagent detail, all is there for you for as long as you need it, searchable and shareable. A great way to "share your context" to your colleagues, as well as being what I really needed to learn from my mistakes and improve in terms of prompt writing!