Taimoor Khan

Co-founder, Stonepath Labs. Building TAQ

#98609380 followers 7 following

🎉 6 day streak

>10,000All time

7 KP

Links

Twitter

Badges

Tastemaker

Gone streaking

Gone streaking 5

Recently Supported

RuntimeSandboxed coding agents for everyone on your team

AgentspanOpen-source runtime for durable AI agents

Agent SandboxYour agent's personal remote computer and drive

Forums

•

2d ago

built an open source SDK for catching AI agent regressions before you ship

been building agents for a while and kept hitting the same problem. fix a failure, change the prompt or model, same failure comes back quietly. nobody catches it until a user does.

built replayd to solve this. captures failed agent runs as regression tests and replays them before you deploy. if the same failure returns after a prompt, model, or tool change, it catches it.

the grading part was the interesting problem. can't use exact output matching because LLMs are non-deterministic. so instead of checking the text, it checks whether the specific failure came back. wrong tool called gets a hard assertion. policy violation gets an LLM judge.

v0.1.2, early but works end to end. zero runtime dependencies in the core.