Launching today

Deep Work Plan
Models matter. Context matters more. Give your agent a plan.
103 followers
Models matter. Context matters more. Give your agent a plan.
103 followers
Deep Work Plan turns any repo into a harness with the context of your best engineer — so any AI agent codes like your smartest model and can't drift from the plan. Not a chat window it forgets, a spec written into the repo: atomic tasks, acceptance criteria, validation gates, resumable state. Long runs survive context resets; any agent picks up where the last left off. Point an agent at it, walk away, come back to work you can verify. Any agent, any repo, no lock-in. Open Source, MIT.








Deep Work Plan
Writing the plan into the repo rather than the context window is the right architecture. Durable state that survives model swaps and context resets is what makes long multi-step tasks actually viable. The validation gate pattern catches drift before it compounds. How are the gates implemented? Are they executable assertions the agent runs itself, or do they require human sign-off?
Deep Work Plan
@anand_thakkar1 Glad you went straight to the gates, that's where most of the design weight sits. Short answer: executable assertions the agent runs itself. Human sign-off bookends the run, it doesn't live inside it: a person approves the plan before execution starts, and reviews the final diff at PR time. Execution in between is autonomous, that's the whole point.
Every task has a `Validation` section that names concrete commands, typically the repo's own quality gate (tests + lint + type-check for a TS repo, `cargo test && cargo clippy` for Rust, `pytest && ruff` for Python). DWP doesn't impose a test runner, it reads whatever the project already considers "the code is healthy" and binds the task to that. A task isn't marked `[x]` done unless those commands exit 0, and tasks that change behavior have to extend the suite, so the gate isn't "existing tests pass" but "the new tests that prove this works pass too."
On failure, the task is marked `[!]` blocked and the agent stops. The failure surfaces in the progress log instead of getting buried in a chat transcript, which is the cue to `refine` the plan or pull a human in, not to push through.
One plan-level gate worth calling out: every plan ends with a mandatory security-analysis pass over the whole change set, plus a skill-discovery pass that proposes new reusable skills from what was built.
Take a look at the Core loop write-up with the full validation and completion protocol: https://deepworkplan.com/methodology/02-core-loop/
Mailwarm
How do you keep the plan from getting stale as humans change the codebase between runs?
Deep Work Plan
@naimz Great question, honestly the failure mode I worried about most while designing this, because it's the one most "just write a spec" approaches quietly ignore. The plan isn't a snapshot of the code, so it doesn't rot like one. DWP handles drift on three fronts.
The first is that I write tasks as behavior, not edits. An acceptance criterion in a DWP task reads like "`POST /login` rate-limits to 10 attempts per IP per minute and returns 429 with a `Retry-After` header," not "add a Redis client in `auth.ts` and wrap the handler." So if a teammate swaps the store for an in-memory cache between runs, lifts the check into a CDN rule, or just renames the file, nothing in my plan is invalidated, the criterion is still expressible against the current code.
The second is that every task carries its own validation gate, and the gate re-runs against the repo as it is right now, not the repo as it was when I wrote the plan. So if someone broke an assumption between runs, the next run fails loudly at that gate instead of drifting silently, and that failure is my cue to `refine` before continuing, not to paper over it.
The third is that I made keeping the spec in sync with the code part of the work, not a separate chore. Any DWP task that changes behavior also updates the `docs/`, `AGENTS.md`, and `.agents/` kit that describe it, re-syncing the repo's agent-facing surface is part of the task's validation gate. On top of that, every plan ends with a security-analysis pass and a skill-discovery step that proposes new reusable skills out of what was just built. It's basically the Boy Scout rule applied to the harness, every run is meant to leave the codebase a little more agent-ready than it found it, not more stale.
If you want the longer take on why we built it this way, the methodology write-up walks through it: https://deepworkplan.com/methodology/
The "repo as harness" idea is clever — giving agents durable context instead of a fresh chat window every time is exactly what long-horizon tasks need. Context drift is probably the #1 reason agent work falls apart mid-task.
Is the plan file something you generate once and manually update, or does it evolve automatically as the codebase changes?
Deep Work Plan
@doganakbulut Good instinct to ask what happens after the initial harness setup, that's where it actually gets interesting. The short version: it's neither of the two options you posed. You're not hand maintaining it, and it isn't blindly syncing itself to the code either. It's generated once, and then it keeps maintaining itself as part of the work, which is really the whole point of the methodology.
You start from a goal, and DWP decomposes it into atomic tasks, each with acceptance criteria and a validation gate. From there, that file is the source of truth. I deliberately don't auto-rewrite it from code diffs. If the spec just chases the code, the code becomes the truth and the spec turns into a lagging mirror, which is the exact drift we're trying to kill. So it evolves on purpose, not silently: every task's gate re-runs against the repo as it is right now, so when something changes between runs the gate fails loudly instead of rotting in silence, and that's the cue to `refine`. The agent does that refinement during the run; you mostly bookend it, approving the plan up front and reviewing the diff at PR time.
The part I'd really stress: keeping things current is work the plan does, not a separate chore. Any task that changes behavior also updates the `docs/`, `AGENTS.md`, and `.agents/` kit that describe it, and extends the tests that prove it, and that re-sync is part of the task's own validation gate. On top of that, every plan closes with a security-analysis pass and a skill-discovery step that turns what was just built into reusable skills. So the docs and tests evolve alongside the code by construction. The agent is continuously self-documenting, instead of leaving a stale spec behind.
It's basically the Boy Scout rule applied to the harness: every run leaves the repo a little more agent-ready than it found it, not more stale.
Longer take in the methodology write-up: https://deepworkplan.com/methodology/
spec-written-into-the-repo is the right model - a persistent plan that survives context resets and that any agent can pick up is fundamentally different from a prompt you're manually re-feeding each session. the acceptance criteria + validation gates combo is the piece most agent frameworks don't bother with. curious how it handles cases where the atomic tasks turn out to be wrong mid-run - can you edit and resume without blowing the state?