Launching today

Retrace

Launching today

Debug AI agents by replaying and forking runs

51 followers

Debug AI agents by replaying and forking runs

51 followers

Visit website

AI Chatbots

•

AI Metrics and Evaluation

•

AI Engineer

Record, replay, fork & share AI agent executions. See every LLM call, tool invocation, and error your agent makes, then debug and iterate in seconds. Free for 1,000 traces/mo.

Free

Launch tags:Productivity•Developer Tools•Artificial Intelligence

Launch Team

Customer.ioAutomate Messaging Everywhere — Startups Get 12 Months Free

Promoted

Retrace

Maker

📌

Retrace records every LLM call, tool call, and error in a run as a span inside a trace. You can replay a past run step by step, like scrubbing through a video. When you find the step that broke, you fork it, change the input or model at that point and the agent re-executes from there, so you can compare the original and the new path side by side. The part I care most about is the forking: it's closer to git branching than to re-running a prompt. Pre-fork steps replay from the recording; everything downstream runs live. It's early, and I'd really like your feedback — especially on the replay and fork flow, and what would make it fit your stack. Which frameworks or providers are you using? Happy to answer anything here.

Report

21h ago

Forking a run like a git branch is exactly how agent debugging should work. Replay alone rarely helps when the failure came from one weird tool response ten steps in.

Also went through your forum thread on separating real regressions from provider noise — nice to see nondeterminism treated as a first-class problem (first-divergence diff + verdict) instead of being waved away.

One thing I couldn't find though: when everything downstream of the fork runs live, do the agent's tool calls actually execute?

I work on agents with real side effects (checkout, payments, emails), and mocking those from the recording would be the difference between "safe to fork production runs" and not.

Report

4h ago

For Retrace, when you say users can replay and fork runs, does the fork preserve the full context of the original AI agent run, or is it more about starting from a selected point in the trace? I can imagine both being useful for debugging, especially when a bad tool call or prompt change happens midway through a run.

Report

5h ago

Replay + fork is exactly how agent debugging should work. Today my 'debugging' is reading transcripts of production calls and guessing which turn derailed it - being able to fork from the exact step and test a fix against the same context would save hours. Does it work with voice agents / live conversation logs, or is it aimed at tool-calling agents? Congrats on the launch.

Report

4h ago

finally something that lets me actually see why my agent broke instead of digging through logs. the replay view caught a tool call loop in seconds, super useful.

Report

5h ago

Forum Threads

p/retrace-2

•

17h ago

How do you tell a real regression from model noise when replaying a run?

When you replay or fork a run in Retrace, the steps before the fork come from the recording, but everything after runs live against the model. So two runs of the same input rarely match exactly, even when nothing actually broke.

That makes the useful question harder than it sounds: when a replay diverges, is it a real regression from your change, or just provider non-determinism? Retrace currently shows a first-divergence diff and a verdict of improved, regressed, or unchanged, but I would like to hear how others handle it. What tolerance do you use in practice, and would you rather see a strict step-by-step diff or a semantic comparison of each step?

View all