projectmem

Memory + judgment for AI coding agents (local, MIT)

21 followers

Memory + judgment for AI coding agents (local, MIT)

21 followers

Lightweight memory + judgment layer for AI coding agents. No daemon, no ports — just stdio MCP + plain-text JSONL in your repo. Captures bugs, failed attempts, and fixes inside your project, then warns at git commit before you repeat a known dead-end. 14 tools, ~600 LOC, open source, MIT, 100% local — your AI finally remembers what it tried last week.

Free

Launch tags:Open Source•Developer Tools•Artificial Intelligence

Launch Team / Built With

Framer AI AgentsDesign and publish professional sites with AI

Promoted

Maker

📌

For anyone curious how it actually works — I just posted a full video walkthrough. Real project, real bugs, and the pre-commit warning firing live. Happy to answer anything 🙏

Watch: https://youtu.be/pELGdXHj_Ls?si=9o0yZ3WL0jP2hyTE

Hi PH 👋

A few weeks ago I watched my AI coding agent confidently suggest the same broken CSS fix I'd rejected the previous Friday. Same contain: layout solution. No memory I'd already tried it. Every new chat was Groundhog Day.

I'd been hacking on a fix for months — that incident pushed me to finally ship.

projectmem is a local-first memory + judgment layer for AI coding agents. It captures development events — bugs, failed attempts, fixes, decisions, gotchas — into plain-text files inside your repo. Your AI agent (Claude / Cursor / Antigravity / Codex) reads it through 14 MCP tools. Same agent, same model — but now it actually remembers what worked, what didn't, and why.

Five killer features, all wired by one pjm init:

1. Pre-commit warning — the git hook checks your staged file against memory and warns BEFORE the commit if there's a logged failed approach on that file. The killer feature. Memory + judgment at the moment of action.

2. Cross-project memory — lessons learned in one repo automatically surface in other repos with the same stack (~/.projectmem/global/). A React gotcha you fix in proj-a appears in proj-b. Stack-aware filtering, so a vite project's "next" mention doesn't pollute Next.js gotchas in your actual Next.js repos. 100% local, no cloud sync.

3. Provable ROI (pjm score) — an A+ → F grade backed by concrete numbers: debugging hours saved, tokens prevented, dollars protected. Output as terminal, JSON for CI, or a shields.io badge for your README. The first AI memory tool with metrics a CTO can actually verify.

4. Smart context injection (pjm wrap) — launches your agent with a token-budgeted context block already loaded. Your AI starts experienced, not blank. Works with Claude Code, Cursor, Aider, and clipboard-paste for the rest.

5. Real-time file watcher (pjm watch) — auto-starts on pjm init in interactive terminals. Catches rapid edits to the same file (debugging sessions). Battery-aware, gitignore-aware, opt-out with --no-watch.

Plus an interactive D3 dashboard (pjm visualize) with four views, all auto-generated from your memory — zero extra AI tokens:

• Story Map — every decision, milestone, and failure as an interactive graph. Failed files glow red in the heatmap.
• ROI Dashboard — animated counters for tokens prevented and USD protected, capture-source donut, cumulative savings chart.
• Architecture Map — toggle between a horizontal dendrogram (hierarchical) and a force-directed graph (relationships, churn).
• Event Timeline — chronological events with AUTO / Manual badges and a project-activity bar chart.

The framing I keep landing on: most "AI memory" tools are retrieval engines — they store conversations and surface them when asked. projectmem is a judgment layer — it captures events with explicit outcomes (worked / failed / partial) and uses git context to interrupt you before you waste another afternoon.

What it isn't:

→ Not a chat memory — captures development events, not conversations
→ Not a retrieval engine — search is exact substring; semantic search is opt-in for v0.2
→ Not a daemon — stdio MCP, no port, no process to babysit
→ Not networked — no cloud, no telemetry, no accounts

100% local. MIT. ~600 LOC Python. 58 unit tests. End-to-end verified across Claude Desktop, Claude Code, Cursor, Antigravity, Codex.

Install:

pip install projectmem

Try it on a real project — the pre-commit hook usually catches its first real failure within the first week. If it doesn't, you've lost 60 seconds of pjm init. If it does, you've recovered an afternoon.

Honest about rough edges — search is substring not semantic, API may shift before 1.0, the precheck heuristics are simple right now. v0.2 roadmap: stale-memory detection (flag, never delete), explicit --supersedes on add_decision, semantic search as opt-in.

Genuinely curious what would make this useful for your stack. What's the worst "I already told you this last week" moment you've had with your AI agent? That's exactly the pattern I optimized this for.

Thanks for taking a look 🙏

— Ripon

Report

2mo ago

@riponcm The pre-commit warning is especially interesting because it catches mistakes before the same bad fix gets repeated. How you decide what becomes useful memory vs noise - do developers log it manually, or does projectmem infer it from failed attempts and fixes?

Report

2mo ago

Maker

@dmitrii_volosatov Great question — this is where I made a deliberate trade-off.

projectmem is agent-initiated, schema-constrained. The MCP server exposes typed tools (log_issue, record_attempt, record_fix, add_decision, etc.) and an AI_Instruction .md file tells the agent when to call them. So when you say "that didn't work, log it as a failed attempt," the agent calls record_attempt with outcome: "failed" — the schema literally rejects anything outside worked|failed|partial. That structural pressure does a surprising amount of noise-filtering on its own. No free-form "the AI thought this might be relevant" entries.

Some things are automatic: git commits captured via post-commit hook, secret redaction on every event before it hits disk, and stack detection at init. But the judgment of "is this worth remembering" is offloaded to the agent + developer in the loop. Manual override via pjm CLI for when the agent misses something.

The honest limitation: quality is only as good as the agent's discipline in following the instructions. I've seen Claude and Codex log diligently; some smaller models forget. v0.3 will probably add a lightweight importance/dedup pass — but I'm wary of adding an ML classifier that becomes its own source of noise. Structure first, ranking later.

Appreciate the question — this is exactly the design tension I keep wrestling with 🙏

Report

2mo ago

Memory for coding agents makes sense, but the interesting engineering challenge is relevance filtering .A codebase with months of history has way more context than any session can use. The 'judgment' part of the name suggests projectmem has an opinion on what to surface and what to leave out. Would love to understand the mechanism behind that, is it semantic similarity, recency, or something more task-aware?

Report

2mo ago

Maker

@ayushi18 Thanks Ayushi! really good a question, and you put your finger on exactly the part I think about most.

Honest answer for v0.1.3: the "judgment" today is structural, not semantic. There are no embeddings yet. The filtering happens on three axes:

1. Structure. Events aren't a flat chat log ; they're typed: issue, attempt(worked|failed|partial), fix, decision, note, gotcha. So when an agent calls get_context, it's not getting raw history; it's getting open issues + recent decisions + the latest fix per issue. Closed-and-fixed stuff drops out by default.

2. File-path keying. This is where the pre-commit warning gets its bite. When you git commit styles.css, the hook runs precheck_file styles.css , it surfaces only failed attempts and gotchas tied to that path. So "months of history" never floods in; you get the slice that's literally about the code you're touching right now. Task-aware in the narrow sense.

3. Recency + caps. get_context defaults to the last N events with a hard cap (configurable). search_events is keyword + type filter, not vector search.

So today: opinionated because of the schema, not because of an ML ranker.

Semantic similarity is the obvious next layer, it's on the v0.3 roadmap (local embeddings, no cloud). The harder question I keep circling is task-aware ranking, knowing the agent is debugging a layout bug vs. refactoring auth, and weighting differently. That probably needs the agent itself to pass a query intent, not just a keyword. Still figuring it out.

Appreciate the sharp question, exactly the kind of feedback I was hoping launch day would surface.

Report

2mo ago

@riponcm Thanks for breaking that down so clearly. The file-path keying for the pre-commit hook makes a lot of sense, scoping context to the exact file being committed is a much tighter signal than trying to rank all project history at once. The part I find most interesting is your point about task-aware ranking needing the agent to pass query intent. That's essentially asking the agent to be self-aware about what kind of problem it's solving, which feels like the harder problem underneath the memory problem. Looking forward to seeing how v0.3 approaches it.

Report

2mo ago

Maker

@ayushi18 Exactly! self-aware about what problem it's solving, is the real bottleneck, and it sits one layer below memory. My current hunch is that the agent doesn't need full intent, just change-scope (which files, debug vs refactor vs new code) — lighter signal, probably enough to weight retrieval. Thanks for pushing on this 🙏

Report

2mo ago

Forum Threads

p/projectmem

•

1mo ago

projectmem v0.1.4 is live — the accountable-judgment release

Hey everyone Three weeks after launch, projectmem v0.1.4 is out the release I'm most proud of.

The core idea got sharper: your AI's memory now flags its own staleness instead of silently trusting (or deleting) old decisions.

What's new:

Stale-memory detection when a decision's file has moved on in git, projectmem flags it ("predates 7 commits confirm or supersede"). It never deletes; you decide.

View all