
Stagent
Drive Claude Code through long tasks it would otherwise drop
54 followers
Drive Claude Code through long tasks it would otherwise drop
54 followers
Claude Code is great at starting long tasks — bad at finishing. It self-approves, patches symptoms, fakes TDD, stops at "code written." Stagent drives Claude Code through any state machine you define (e.g. plan → verify → review → ship). Different agents per stage - it can't self-approve or bail halfway. Describe your own workflow in plain English with /stagent:create, or fork one from the cookbook: stagent.worldstatelabs.com/cookbook Plus: live viewer, cross-machine resume.










Stagent
the failure mode that kills claude code on long tasks is silent context drift. confidence stays high while references to what we agreed 20 messages ago start disappearing. does stagent detect that, or lean on explicit checkpointing? curious if you found a signal that fires before the model thinks it's still on track. congrats on the launch and good luck :)
Stagent
@hiyamojo Great question, and you've put your finger on the real failure mode.
Honest answer: stagent doesn't try to detect silent drift — there's no reliable signal that fires before the model thinks it's on track (if there were, that'd be a paper, not a feature). High confidence plus decaying recall is exactly the case where introspection fails, so we don't lean on it. Instead the design assumption is that drift is inevitable on long horizons, so the goal is to bound its blast radius rather than catch it mid-thought:
- Work is a state machine of discrete stages (plan, execute, review, QA, deploy). The agreed plan is written to a file at stage 1 and is a required input that later stages re-read from disk — not from conversation memory. "What we agreed 20 messages ago" isn't recalled, it's reloaded.
- Each stage emits an artifact with an epoch stamp; transitions only happen through a gate that validates the artifact exists with the right epoch. A stale or contradictory artifact can't silently advance the machine.
- Subagents don't inherit the main thread's drifting context — they bootstrap fresh from the state file plus the plan, so a 200k-token-deep conversation isn't the substrate the actual work runs on.
So: explicit, externalized checkpointing — but structural, not "write a summary every N turns." The bet is that re-grounding every stage in files beats trying to keep one long context coherent. It doesn't catch drift; it makes drift mostly not matter.
Honestly the 'fakes TDD' observation is what got me. It's such a specific and accurate description of the failure mode. Claude Code is weirdly good at looking like it's following process while quietly skipping the hard parts. The state machine idea makes sense because you're basically saying the workflow shouldn't live inside the model's context where it can conveniently forget it. Does the cross-machine resume store the full context or just the stage outputs? Wondering how much of the 'memory' of earlier stages carries forward.