Stagent

Drive Claude Code through long tasks it would otherwise drop

54 followers

Drive Claude Code through long tasks it would otherwise drop

54 followers

Visit website

AI Coding Agents

•

AI Engineer

Claude Code is great at starting long tasks — bad at finishing. It self-approves, patches symptoms, fakes TDD, stops at "code written." Stagent drives Claude Code through any state machine you define (e.g. plan → verify → review → ship). Different agents per stage - it can't self-approve or bail halfway. Describe your own workflow in plain English with /stagent:create, or fork one from the cookbook: stagent.worldstatelabs.com/cookbook Plus: live viewer, cross-machine resume.

Free

Launch tags:Productivity•Open Source•Developer Tools

Launch Team

SerpApi for AI Apps and Agents100+ search APIs for LLMs, AI apps, agents, and developers

Promoted

Stagent

Maker

📌

Hey PH — I'm Jie, maker of stagent. I've spent months trying to drive Claude Code through long tasks. The model isn't the problem — the loop is. One long session lets the agent grade its own homework. Tell it "look at the codebase first" — discipline lasts three turns, then it's coding from memory. Tell it "find the root cause" — you get try/except: pass. Tell it "TDD" — it writes the implementation and leaves the test as a TODO. Tell it "audit the repo" — it skims a few random files, lists "race conditions! null checks!", edits two of them, stops. I built stagent. It runs any workflow you can describe as a state machine — your stages, your transitions, your gates. Different agents in different stages - it can't self-approve or bail halfway. Describe what you want in plain English and /stagent:create scaffolds the whole workflow for you. Or browse the cookbook for inspiration — 14 long-task patterns we kept hitting ourselves: https://stagent.worldstatelabs.c... Plus: live browser viewer, cross-machine resume. What's the long task Claude Code keeps half-finishing for you?

Report

2mo ago

the failure mode that kills claude code on long tasks is silent context drift. confidence stays high while references to what we agreed 20 messages ago start disappearing. does stagent detect that, or lean on explicit checkpointing? curious if you found a signal that fires before the model thinks it's still on track. congrats on the launch and good luck :)

Report

1mo ago

Stagent

Maker

@hiyamojo Great question, and you've put your finger on the real failure mode.

Honest answer: stagent doesn't try to detect silent drift — there's no reliable signal that fires before the model thinks it's on track (if there were, that'd be a paper, not a feature). High confidence plus decaying recall is exactly the case where introspection fails, so we don't lean on it. Instead the design assumption is that drift is inevitable on long horizons, so the goal is to bound its blast radius rather than catch it mid-thought:

- Work is a state machine of discrete stages (plan, execute, review, QA, deploy). The agreed plan is written to a file at stage 1 and is a required input that later stages re-read from disk — not from conversation memory. "What we agreed 20 messages ago" isn't recalled, it's reloaded.

- Each stage emits an artifact with an epoch stamp; transitions only happen through a gate that validates the artifact exists with the right epoch. A stale or contradictory artifact can't silently advance the machine.

- Subagents don't inherit the main thread's drifting context — they bootstrap fresh from the state file plus the plan, so a 200k-token-deep conversation isn't the substrate the actual work runs on.

So: explicit, externalized checkpointing — but structural, not "write a summary every N turns." The bet is that re-grounding every stage in files beats trying to keep one long context coherent. It doesn't catch drift; it makes drift mostly not matter.

Report

1mo ago

Honestly the 'fakes TDD' observation is what got me. It's such a specific and accurate description of the failure mode. Claude Code is weirdly good at looking like it's following process while quietly skipping the hard parts. The state machine idea makes sense because you're basically saying the workflow shouldn't live inside the model's context where it can conveniently forget it. Does the cross-machine resume store the full context or just the stage outputs? Wondering how much of the 'memory' of earlier stages carries forward.

Report

1mo ago

Forum Threads

p/stagent

•

2mo ago

What's the longest task you've ever tried to get Claude Code hrough？

I've been using Claude Code daily for months. Short tasks bug fixes, small features it nails. But anything that runs for half an hour starts going sideways: it self-approves "done," patches symptoms instead of root causes, or just stops at "code written" before deploying.

Curious what other people are pushing it through:

View all

@hiyamojo Great question, and you've put your finger on the real failure mode.

- Subagents don't inherit the main thread's drifting context — they bootstrap fresh from the state file plus the plan, so a 200k-token-deep conversation isn't the substrate the actual work runs on.