PMB - Stop re-explaining your project to AI coding agents
by•
PMB gives Claude Code, Cursor, Codex and Zed persistent project memory through MCP. It stores decisions, lessons, goals, recent work, project facts and docs in one SQLite workspace on your disk. No cloud, no API keys, no LLM call on the read path. It is open source, offline-first, inspectable/exportable, with a local dashboard and honest impact tracking so you can see which memories actually help.


Replies
We run agents across multiple client projects simultaneously, so stale memory leaking into the wrong context is a real operational risk. The keyed fact system handling latest-wins with old value archived covers simple attribute updates, but I'm curious how it handles decisions that don't have a clean key (for exmaple, an architectural direction that got reversed mid-project without an explicit "we switched from X to Y"). Does the conflict surface in the dashboard, or does the old decision just keep scoring well on BM25 until someone manually archives it?
PMB
You're hitting two distinct things - isolation and staleness - so let me split them.
Cross-project leakage is handled by isolation, not ranking. Each project/client is its own workspace with a separate store: its own SQLite events, vector index, BM25 index and graph under ~/.pmb/workspaces/<id>/. Recall is scoped to the active workspace, so one client's memory can't score into another's context. Leakage across clients isn't a ranking problem here - it's walled off.
The keyless reversed decision is the honest gap, and you called it exactly: an architectural direction that flipped mid-project, no clean key, no explicit "we switched from X to Y." Today there's no semantic conflict-detection, so the old decision keeps scoring on BM25 + vectors until recency/forgetting-curve decay down-weights it, a correction overrides it, or someone archives it. The dashboard shows the timeline and both decisions, but it doesn't currently auto-flag that the two conflict - so worst case, yes, the stale one scores well until it's manually archived. I won't pretend otherwise.
That's precisely the next build: reversal/conflict detection that links a new decision to the one it supersedes even without a clean key, plus a needs-review surface in the dashboard that flags candidate conflicts for one-click supersede/archive, so it isn't on a human to notice. For a multi-client setup like yours that's the difference between trusting it and auditing it constantly, so it's high on the list.
Repo if you want to follow where it goes: https://github.com/oleksiijko/pmb
This is exactly the problem that makes AI coding feel like a conversation reset every 5 minutes. You paste context, it forgets, you paste again. The local-first memory angle is smart - keeping it in the project rather than some cloud sync feels like the right call for sensitive codebases. Does it handle monorepos where different agents might need different context scopes?
PMB
@omri_ben_shoham1 Yeah, "conversation reset every 5 minutes" is exactly it.
Two ways today. For hard separation, a workspace isn't locked to a git repo - it's just an isolated store - so you can run one per package/app in the monorepo, and different agents pointing at different workspaces get genuinely different scopes that can't bleed into each other. For a single shared workspace, scoping is by relevance plus the code-entity graph: an agent working in package A surfaces memory tied to the paths and entities it's touching, not the whole monorepo. The fully airtight version - a hard locality gate so an agent only ever sees memory for the subtree it's in - is on the roadmap; today that scoping is emergent from ranking rather than enforced. No dedicated "monorepo mode" config yet, but those two primitives cover it in practice.
Repo if you want to follow along: https://github.com/oleksiijko/pmb
DiffSense
But remembering also has a cost. Your front loading context. aka using a lot of the available context before solving a problem. That means your runway to solve it is smaller. How do you get around that? some times you need all the context runway you have 😅
PMB
@conduit_design Ha, this is the sharpest version of the tradeoff - and you're right, remembering isn't free; it competes with the working window.
What keeps it cheap is that PMB is selective retrieval, not a dump. Recall is top-k and relevance-ranked (BM25 + vectors + graph), so what gets injected is a handful of items actually tied to the task - usually a few hundred tokens, not the whole store. prepare() at the start is a compact summary (counts, a few surfaced lessons, open goals), and recall() is pulled on demand mid-task rather than front-loading everything up front.
The other half: the baseline isn't an empty window, it's you re-pasting context every session or a fat always-on rules file that sits in context regardless of relevance. PMB swaps "always-on everything" for "on-demand relevant slice," so in practice it usually buys back more runway than it spends. And top-k is tunable - for a context-hungry task you keep the footprint minimal. There's still a nonzero floor, I won't pretend otherwise, but the whole design is "smallest slice that changes the answer," precisely because runway matters.
Repo if you want to follow along: https://github.com/oleksiijko/pmb
DiffSense
@oleksiijko Thanks for your answer. Any benchmarks for this? like head to head. Make a todo list app. with and without your smart memory mcp. And then run over 10 times or so against an agent not using PMB? and then viewing token use etc? and quality of delivery?
PMB
@conduit_design Exactly the right thing to ask for, and the right methodology - I'd rather show data than argue architecture.
Honest answer: I have one real head-to-head so far, not the repeated benchmark you're describing. On a small CLI build task, same agent with vs without PMB, measured from the agent's own token/turn telemetry: roughly 28% fewer tokens, ~31% less wall-clock, and fewer failed steps - mostly because the agent recalled the project's test command and conventions instead of re-deriving them by trial and error. Directionally strong, but it's n=1, so I won't dress it up as more than that.
What you're describing - one fixed task (a todo app is perfect), with/without PMB, N runs each, reporting tokens, turns and pass/fail plus a quality rubric - is exactly the benchmark that should exist, and the honest way to prove or kill the claim. I'm going to build it as a reproducible harness in the repo so anyone can run it, not just trust my screenshot. Token/time is easy to measure; the "quality of delivery" side is the harder part - that needs a fixed rubric or an LLM judge so it isn't me grading my own homework.
DiffSense
@oleksiijko really cool. Put it in a CI run. Run it every week as models improve etc. Then you have hard proof. also make a few different benchmarks maybe. some creative challanges. some more mechanical etc. That would be cool. And lead with that. people convert on outcomes. 5x faster 10x cheaper. reads better than. shared memory in all your tools etc. ✌️
Mailwarm
How do you decide what gets written into memory, like is it automatic from chats or only explicit saves?
PMB
@karimbenkeroum Both. An ambient layer auto-captures the work product - decisions, lessons, corrections (correct the agent and it's saved as a high-priority lesson), completed work - deduped so nothing's stored twice. Explicit "remember this" is the override for pins, personal facts, or future plans. Default is you shouldn't have to tell it.
github.com/oleksiijko/pmb
Persistent memory for coding agents is one of those features where “what not to remember” matters as much as what to store.
The part I’d be most curious to see is a small memory diff after each session: new lesson added, old assumption updated, and which memory actually influenced a suggestion.
That would make it easier to trust local memory instead of treating it like a hidden second prompt. Also helps catch stale project decisions before an agent keeps repeating them.
PMB
@grace_lee26 Completely agree - "what not to remember" is the whole game, and treating memory as an auditable layer rather than a hidden second prompt is exactly the right framing.
The session diff is a great way to get there, and most of the raw material is already in place: every event is session-tagged and timestamped, and the dashboard already tracks which lessons influenced outcomes. What's missing is packaging that into a tidy after-session view - "here's the new lesson, here's the assumption that changed, here's the memory that shaped this suggestion." It's a clean thing to build on top of what's there, and it does double duty: makes local memory trustable, and surfaces stale decisions before an agent keeps acting on them. Going on the list.
Repo if you want to follow along: https://github.com/oleksiijko/pmb
Love the local-first SQLite approach here; keeping project memory on-disk instead of a hosted service is a smart trust boundary, because sensitive architecture decisions and lessons learned never leave your machine.
PMB
@ilko_kacharov Exactly the intent - a deliberate trust boundary: architecture decisions and lessons never leave your machine, and secrets are redacted on write. Thanks for getting it.
github.com/oleksiijko/pmb
Every new Claude Code session I spend the first few minutes re-explaining the same architecture decisions. The local-first + MCP approach is the right call, once project memory lives in a third-party cloud it becomes a security conversation for any serious team. The part I'd want most is the impact tracking that shows which memories actually influenced agent suggestions. Context without attribution is just noise. Look forward to hearing from you.
PMB
@frankgebuilder Exactly. The security boundary is a big part of why PMB is local-first: architecture decisions, mistakes, internal constraints, and “please never do this again” rules are exactly the kind of context teams do not want drifting into a third-party memory layer. And I completely agree on attribution. Memory should not just be injected silently and hoped for the best. PMB already tracks this for lessons: when a lesson is surfaced, it gets a surface_id; later agent actions can be linked back to that surface, and PMB tracks whether it was followed, ignored, or not applicable.
There is also an “Earned Memory” layer that connects surfaced lessons to outcomes like tests passing, builds, deploys, red-to-green fixes, and churn, so you can see which memories are actually pulling weight.
The next step is expanding that attribution beyond lessons into a fuller per-session influence trail for decisions, facts, goals, and project context. That’s the difference I want PMB to make: not just more context, but accountable context. If memory changes the agent’s behavior, you should be able to see why.
Local-first memory for coding agents is interesting because context drift is usually where my sessions go sideways. I like that it avoids another hosted workspace. Curious how you handle stale project assumptions after a big refactor?
PMB
@xiaosong001 That’s exactly the tricky part. PMB should not treat memory as “true forever”.
Today recall is weighted by relevance, importance, recency, and graph/entity proximity, so newer memories tied to the files or services you’re touching can outrank older generic assumptions. For clean state-like facts, PMB also supports supersession/validity windows.
The next step is making this more explicit in the dashboard: active / superseded / needs-review states for project assumptions after big refactors.
Repo if you want to take a look or follow the work: https://github.com/oleksiijko/pmb
Love that the whole pitch is just "stop re-explaining your project" repeated like a mantra. That repetition actually sells it, because re-feeding context every session is the exact pain I feel daily with our coding agents.
PMB
@yibo_wang3 Exactly - I wanted the pitch to stay boringly specific because the pain is boringly repetitive.
It’s not “AI memory magic”. It’s just: stop teaching the same project decisions, bugs, and rules to every new agent session. That daily context tax is the whole reason PMB exists.
@oleksiijko - Very nice. Definitely beats managing multiple .md files locally with skill integration. Will review in more detail, but looks very promising.
PMB
@francois_marais_nz Thanks - that's exactly the itch: replacing the sprawl of hand-maintained .md files with something that captures and retrieves on its own. Enjoy the deeper look, would genuinely value your take after.
github.com/oleksiijko/pmb