Launched this week

Retrace

Name: Retrace
Rating: 4.0 (1 reviews)

Launched this week

Debug AI agents by replaying and forking runs

4.0•1 review•

109 followers

Debug AI agents by replaying and forking runs

4.0•1 review•

109 followers

Visit website

AI Chatbots

•

AI Metrics and Evaluation

•

AI Engineer

Record, replay, fork & share AI agent executions. See every LLM call, tool invocation, and error your agent makes, then debug and iterate in seconds. Free for 1,000 traces/mo.

Free

Launch tags:Productivity•Developer Tools•Artificial Intelligence

Launch Team

Framer 3.0With Agents, Branching Community and an all-new design

Promoted

Finally something that makes debugging AI agents less painful. The forking feature let me branch off a stuck trace and rerun it with a different prompt in like a minute. Super practical for anyone shipping agents right now.

Report

1d ago

The replay feature is genuinely useful, I reran a flaky agent run and could pinpoint exactly where it stalled without digging through logs. Free tier is enough to actually evaluate it before committing.

Report

1d ago

Love how clean the replay view is, being able to scrub through each LLM call and tool invocation without losing context makes debugging agents feel way less like guesswork.

Report

1d ago

Spent a few minutes replaying a flaky agent run and being able to fork the exact trace to try a different prompt without rerunning the whole thing was honestly a nice surprise. The tool call breakdown finally makes it obvious where my agent was looping.

Report

1d ago

1 2 3

Forum Threads

p/retrace-2

•

2d ago

How do you tell a real regression from model noise when replaying a run?

When you replay or fork a run in Retrace, the steps before the fork come from the recording, but everything after runs live against the model. So two runs of the same input rarely match exactly, even when nothing actually broke.

That makes the useful question harder than it sounds: when a replay diverges, is it a real regression from your change, or just provider non-determinism? Retrace currently shows a first-divergence diff and a verdict of improved, regressed, or unchanged, but I would like to hear how others handle it. What tolerance do you use in practice, and would you rather see a strict step-by-step diff or a semantic comparison of each step?

View all