Bench for Claude Code

Store, review, and share your Claude Code sessions

716 followers

Store, review, and share your Claude Code sessions

716 followers

Visit website

AI Coding Agents

Claude Code just opened a PR. But do you really know what it did? By using Bench you can automatically store every session and easily find out what happened. Spot issues at a glance, dig into every tool call and file change, and share the full context with others through a single link: no further context needed. When things go right, embed the history in your PRs. When things go wrong, send the link to a colleague to ask for help. Free, no limits. One prompt to set up on Mac and Linux.

Free

Launch tags:Developer Tools•Artificial Intelligence•Data Visualization

Launch Team / Built With

DigitalOcean Serverless Inference — 55+ AI models behind one OpenAI/Anthropic-compatible API

55+ AI models behind one OpenAI/Anthropic-compatible API

Promoted

Clipboard Canvas v2.0

I've tackled similar challenges with code reviews and context sharing, and I love how Bench automates session storage. How do you handle sensitive data in stored sessions to ensure developers aren’t accidentally sharing proprietary code?

Report

2mo ago

Bench for Claude Code

Maker

@trydoff Hi there! :)

That really is a tough topic, that we will surely iterate on in the future. Right now, we moved in these directions:

you own your trace: you are free to delete any tracking code, along with all its related sessions, anytime you deem fit. You can even set expiration dates
we intentionally do not record any tool use OUTPUT, just the inputs, precisely because we want to do this right. And, when we'll implement this, it will become an opt-in feature for sure
you can define separate tracking codes for different uses: they are configured through simple envfiles, so it's quite trivial to keep data separated, and e.g. use a disposable tracking code for the activity logs that you may want to eventually delete in the future - or you may even just disable Bench altogether for specific projects if you need to
of course, the sharing functionality is opt-in and completely under your control, so you can share only the sessions that you deem right and stop sharing it whenever you want

It is also worth mentioning that our company, Silverstream, is part of our larger AI Alliance collaboration with CUBE (https://arxiv.org/abs/2603.15798) and we are in the process of offering an open source version, which will help to completely clear out that doubt if this concerns you too much.

I also encourage you to contact us at manuel@silverstream.ai to get further details about the whole process :)

Report

2mo ago

Bench for Claude Code

Maker

Claude Code is so capable that we end up trusting it a little too much. But that's exactly when things get interesting:

I've had it silently migrate my local DB to an incompatible version while fixing a bug.
Another time, Claude decided the only way it had to fix an issue with a particulary inefficient for loop, was to turn off my audio drivers.

The real problem isn't that it made mistakes. It's that I had no way to go back and understand what it did, when, and why, to learn from it and finetune my prompts. Sure, I could just scroll the claude logs, but what if the "failures" weren't apparent until much later? Or what if the issue was at step 315 out of an hour-long agent run of 500 steps?

That's why Bench is a big deal. Not just a logger, but an audit trail that makes agent actions legible: every tool call, file change, conversation, subagent detail, all is there for you for as long as you need it, searchable and shareable. A great way to "share your context" to your colleagues, as well as being what I really needed to learn from my mistakes and improve in terms of prompt writing!

Report

2mo ago

Now add observability + failure handling, otherwise it’s just scheduled guessing.

Report

2mo ago

Bench for Claude Code

Maker

@ion_simion_bajinaru That's exactly what we are here for :) Providing observability for your sessions, both scheduled and in real time!

Report

2mo ago

Unfold

heyy does this work only with claude? or i can use it with gemini, codex too?

Report

2mo ago

Bench for Claude Code

Maker

@janhavidadhania Hey Janhavi! We are currently only supporting Claude Code. However, support for other agents is high priority in our feature list - so stay tuned! You can subscribe at bench.silverstream.ai to get notified when support for future agents will be available!

Report

2mo ago

Bench for Claude Code

Maker

Hey folks! I’m Simone, Co-founder and CTO of Silverstream AI.

Really happy to be launching this today. I’m excited to share it, and very curious to hear your feedback!

One habit we’ve introduced across the team is linking Bench sessions in PRs whenever Claude Code was involved in creating or debugging a change. It gives reviewers a lot more context on how a bug was found and fixed, instead of just showing the final diff.

That’s been one of the most useful workflows for us, and I’d recommend it to other teams using Claude Code too.

I’m also using Bench in a research setting, where session data helps generate detailed methodology reports showing how results were obtained. I’m already finding it useful, and I think there’s a lot more to unlock there!

Looking forward to your thoughts. I want to make Bench as useful for other devs as it's been so far for us, and your input really matters!

Report

2mo ago

OpenOwl

The session sharing feature is what makes this stand out. I've lost count of how many times I've wanted to show a teammate "hey look what Claude did here" and had to resort to copy-pasting terminal output into Slack.

Being able to just send a link to a full session with context would save so much back and forth. Especially useful for code reviews where you want to show the AI's reasoning, not just the final diff.

Report

2mo ago

Bench for Claude Code

Maker

@mihir_kanzariya Exactly, that's one of the main reasons we created Bench! I hope we have effectively solved your issue. When you have the chance to try, please let me know how the experience was, and how we can improve the product!

Report

2mo ago

Bench for Claude Code

Maker

Hey Product Hunt! I'm Omar, Founding Researcher at Silverstream AI.

We originally built Bench as an internal tool to make debugging our own agents less painful, and it's become something I reach for every day.

My favorite part? The high-level run overview. When an agent run has hundreds of steps, being able to scan the whole thing at a glance and immediately spot where something went wrong is a huge time-saver. From there, I can zoom in all the way down to the model's reasoning traces at the exact step where things broke, which makes a real difference when you're trying to understand why an agent made a certain decision, not just what it did.

As we kept adding features, we realized Bench had become too useful to keep to ourselves, so here we are! 🚀

We're starting with Claude Code, but support for more agents is on the way. Give it a try and let us know what you think!

Report

2mo ago

1 2 3

•••

Reviews

Most Informative