Launching today

Deep Work Plan
Models matter. Context matters more. Give your agent a plan.
133 followers
Models matter. Context matters more. Give your agent a plan.
133 followers
Deep Work Plan turns any repo into a harness with the context of your best engineer — so any AI agent codes like your smartest model and can't drift from the plan. Not a chat window it forgets, a spec written into the repo: atomic tasks, acceptance criteria, validation gates, resumable state. Long runs survive context resets; any agent picks up where the last left off. Point an agent at it, walk away, come back to work you can verify. Any agent, any repo, no lock-in. Open Source, MIT.








Deep Work Plan
Writing the plan into the repo rather than the context window is the right architecture. Durable state that survives model swaps and context resets is what makes long multi-step tasks actually viable. The validation gate pattern catches drift before it compounds. How are the gates implemented? Are they executable assertions the agent runs itself, or do they require human sign-off?
Deep Work Plan
@anand_thakkar1 Glad you went straight to the gates, that's where most of the design weight sits. Short answer: executable assertions the agent runs itself. Human sign-off bookends the run, it doesn't live inside it: a person approves the plan before execution starts, and reviews the final diff at PR time. Execution in between is autonomous, that's the whole point.
Every task has a `Validation` section that names concrete commands, typically the repo's own quality gate (tests + lint + type-check for a TS repo, `cargo test && cargo clippy` for Rust, `pytest && ruff` for Python). DWP doesn't impose a test runner, it reads whatever the project already considers "the code is healthy" and binds the task to that. A task isn't marked `[x]` done unless those commands exit 0, and tasks that change behavior have to extend the suite, so the gate isn't "existing tests pass" but "the new tests that prove this works pass too."
On failure, the task is marked `[!]` blocked and the agent stops. The failure surfaces in the progress log instead of getting buried in a chat transcript, which is the cue to `refine` the plan or pull a human in, not to push through.
One plan-level gate worth calling out: every plan ends with a mandatory security-analysis pass over the whole change set, plus a skill-discovery pass that proposes new reusable skills from what was built.
Take a look at the Core loop write-up with the full validation and completion protocol: https://deepworkplan.com/methodology/02-core-loop/
Mailwarm
How do you keep the plan from getting stale as humans change the codebase between runs?
Deep Work Plan
@naimz Great question, honestly the failure mode I worried about most while designing this, because it's the one most "just write a spec" approaches quietly ignore. The plan isn't a snapshot of the code, so it doesn't rot like one. DWP handles drift on three fronts.
The first is that I write tasks as behavior, not edits. An acceptance criterion in a DWP task reads like "`POST /login` rate-limits to 10 attempts per IP per minute and returns 429 with a `Retry-After` header," not "add a Redis client in `auth.ts` and wrap the handler." So if a teammate swaps the store for an in-memory cache between runs, lifts the check into a CDN rule, or just renames the file, nothing in my plan is invalidated, the criterion is still expressible against the current code.
The second is that every task carries its own validation gate, and the gate re-runs against the repo as it is right now, not the repo as it was when I wrote the plan. So if someone broke an assumption between runs, the next run fails loudly at that gate instead of drifting silently, and that failure is my cue to `refine` before continuing, not to paper over it.
The third is that I made keeping the spec in sync with the code part of the work, not a separate chore. Any DWP task that changes behavior also updates the `docs/`, `AGENTS.md`, and `.agents/` kit that describe it, re-syncing the repo's agent-facing surface is part of the task's validation gate. On top of that, every plan ends with a security-analysis pass and a skill-discovery step that proposes new reusable skills out of what was just built. It's basically the Boy Scout rule applied to the harness, every run is meant to leave the codebase a little more agent-ready than it found it, not more stale.
If you want the longer take on why we built it this way, the methodology write-up walks through it: https://deepworkplan.com/methodology/
spec-written-into-the-repo is the right model - a persistent plan that survives context resets and that any agent can pick up is fundamentally different from a prompt you're manually re-feeding each session. the acceptance criteria + validation gates combo is the piece most agent frameworks don't bother with. curious how it handles cases where the atomic tasks turn out to be wrong mid-run - can you edit and resume without blowing the state?
Deep Work Plan
@galdayan That comes up pretty often on real work, so handling it cleanly was a core design goal, not an afterthought. The methodology can refine a plan at any point, including after it's already been partially executed, without throwing away the work that's done.
Here's why that's safe: the state and the task definitions are kept separate. The plan is a checklist on disk plus a small state file, so which tasks are already done is recorded durably, independent of the task text. When a task turns out to be wrong mid-run, the agent doesn't push through. It marks that task blocked and stops, which surfaces the problem instead of burying it in a chat transcript.
From there you refine the plan: edit, reorder, split, or drop the tasks that haven't run yet, while the completed ones stay completed. The refinement only touches the open part of the plan, so nothing that already passed its validation gate gets blown away. Then you resume, and it rebuilds state from disk plus the actual repo and continues where it left off, re-running the gates so nothing that shifted underneath slips by.
So editing a plan mid-run is a normal, first-class move, not a reset. That's the whole reason the plan lives on disk instead of the chat: you can rewrite the route without losing the miles already driven.
Full loop here: https://deepworkplan.com/methodology/02-core-loop/
"Context matters more than the model" is the lesson it took me a year of vibe coding to actually believe. My best and worst sessions use the same model... the difference is whether I handed it a real plan or just vibes. The part I still fight is drift, the agent quietly wandering off the plan three steps in. Does Deep Work Plan keep checking the work back against the plan, or is the plan mostly an upfront thing?
Deep Work Plan
@luca_capone You nailed the exact problem it's built for. The plan is not an upfront artifact you
write once and hope the agent honors, it's the thing the agent executes against, task by
task, and it doesn't get to declare victory until the work is actually validated.
Concretely, drift gets fought in three places:
1. One task at a time, not the whole goal at once. The plan is decomposed into small,
self-contained tasks. The agent works a single task, then has to stop and check itself
before moving on — so it can only wander one step, not three.
2. Every task carries its own acceptance criteria + a validation gate. "Done" isn't the
agent's opinion — it's a checklist plus the exact commands/tests that prove it (tests,
lint, type-check, build). The agent runs them before marking the task complete. If they
fail, the task isn't done, full stop.
3. Progress is written down in the repo as it goes. Each task gets an explicit status
marker (not started / in progress / done / blocked) and a log. So drift becomes visible —
you (or the next agent, or the next session) can see exactly where it is vs. where the plan
said it should be, and resume from the first incomplete task without redoing finished work.
So to your question directly: the plan is a continuous check, not an upfront thing. And
yes, a plan isn't finished until everything validates, including mandatory end-of-plan
review tasks (e.g. a security pass over the whole change set). The agent can't quietly call it
done with a gate still red.
The honest caveat: it can't stop an agent from writing a weak acceptance criterion in the
first place. Garbage gate in, garbage gate out. But it makes drift loud instead of silent,
which, as you said, is most of the battle.
The "repo as harness" idea is clever — giving agents durable context instead of a fresh chat window every time is exactly what long-horizon tasks need. Context drift is probably the #1 reason agent work falls apart mid-task.
Is the plan file something you generate once and manually update, or does it evolve automatically as the codebase changes?
Deep Work Plan
@doganakbulut Good instinct to ask what happens after the initial harness setup, that's where it actually gets interesting. The short version: it's neither of the two options you posed. You're not hand maintaining it, and it isn't blindly syncing itself to the code either. It's generated once, and then it keeps maintaining itself as part of the work, which is really the whole point of the methodology.
You start from a goal, and DWP decomposes it into atomic tasks, each with acceptance criteria and a validation gate. From there, that file is the source of truth. I deliberately don't auto-rewrite it from code diffs. If the spec just chases the code, the code becomes the truth and the spec turns into a lagging mirror, which is the exact drift we're trying to kill. So it evolves on purpose, not silently: every task's gate re-runs against the repo as it is right now, so when something changes between runs the gate fails loudly instead of rotting in silence, and that's the cue to `refine`. The agent does that refinement during the run; you mostly bookend it, approving the plan up front and reviewing the diff at PR time.
The part I'd really stress: keeping things current is work the plan does, not a separate chore. Any task that changes behavior also updates the `docs/`, `AGENTS.md`, and `.agents/` kit that describe it, and extends the tests that prove it, and that re-sync is part of the task's own validation gate. On top of that, every plan closes with a security-analysis pass and a skill-discovery step that turns what was just built into reusable skills. So the docs and tests evolve alongside the code by construction. The agent is continuously self-documenting, instead of leaving a stale spec behind.
It's basically the Boy Scout rule applied to the harness: every run leaves the repo a little more agent-ready than it found it, not more stale.
Longer take in the methodology write-up: https://deepworkplan.com/methodology/