I've been using Claude Code daily for months. Short tasks bug fixes, small features it nails. But anything that runs for half an hour starts going sideways: it self-approves "done," patches symptoms instead of root causes, or just stops at "code written" before deploying.
Claude Code is great at starting long tasks — bad at finishing. It self-approves, patches symptoms, fakes TDD, stops at "code written."
Stagent drives Claude Code through any state machine you define (e.g. plan → verify → review → ship). Different agents per stage - it can't self-approve or bail halfway.
Describe your own workflow in plain English with /stagent:create, or fork one from the cookbook: stagent.worldstatelabs.com/cookbook
Plus: live viewer, cross-machine resume.