Launching today

MartinLoop
Control AI coding agents with limits, proof, + run receipts
39 followers
Control AI coding agents with limits, proof, + run receipts
39 followers
MartinLoop is the control room for AI coding agents. Today, it wraps Claude, Codex, OpenCode, and other agents with spend limits, proof checks, safety rules, rollback, and run receipts. The bigger build turns that into a full agent control plane: dashboards, HeadlessOS-style execution, team oversight, cost visibility, and a trusted record of what every agent did, why it kept going, and why it stopped.









Useful framing. The part I would want before trusting a long-running agent is a concise run receipt: budget spent, verifier failures, scope guard hits, and the exact stop reason. If you surface that per run, it becomes much easier to compare agents without rereading logs.
One thing I would genuinely love feedback on from launch-day testers: what proof would you want before trusting a coding agent overnight? Budget receipt, verifier result, rollback path, or a file-level diff trail? MartinLoop is built around making those runs inspectable instead of just fast.
Good work! Are JSONL records capturing rejected paths or only the committed one?
One lesson from testing coding agents: cost usually spikes after the first failure, not before it. If the agent can't show a receipt for done, the next retry should get harder, not easier: cap the spend, require a verifier check, and stop when the same mistake repeats. If you've seen a failure mode we should test before launch, I'd love that feedback.
A small rule that catches a lot of fake progress: if the agent can't explain what changed since the last attempt in one sentence, it probably should not get another retry yet. That sounds strict, but it saves a lot of budget from "busy" loops that only reshuffle the same failure.
Beginner-friendly agent UX is mostly about predictable stop states. People forgive a small failure if the tool says what happened, what it checked, and why it stopped. They lose trust when the run keeps going with no new signal.
Small update for launch-day testers: the fastest way to try MartinLoop is now:
npx martin-loop demo
The feedback I want most is simple: after an AI coding run fails, what proof would make you trust the next attempt?