Agentmemory - Persistent memory for Claude Code, Codex & coding agents

You can now give Hermes, Claude Code, and Codex infinite memory. Agentmemory is trending on GitHub with 5,000+ Stars. CLAUDE md dumps 22,000+ tokens into context at 240 observations agentmemory: 1,900 tokens. same observations. 92% less. At 1,000 observations, 80% of your built-in memories become invisible. agentmemory keeps 100% searchable. benchmarked on 240 real coding sessions → Up to 95% fewer tokens per session → 200x more tool calls before hitting context limits → 100% open source

Add a comment

Replies

Best

Instead of adding infrastructure on top of Claude or Codex, can't you write this as a skill and do a periodic memory compression and refresh?

I built something similar for my production agents — each one maintains a JSON-based memory file tracking known issues, competitor changes, and operational state. The hardest part wasn't storing the memory, it was deciding what to forget. Without pruning, the context window fills with stale observations from weeks ago and the agent starts making decisions based on outdated reality.

95% token reduction is a big claim — how does it handle memory conflicts when the same topic gets updated by different sessions? In my system I had to implement a "trust what you observe now, update the stale memory" rule because old memories would override fresh observations.

Hi team, looks amazing and well done on the explanation docs. I have a quick question though, I work across multiple machines (home, work, laptop etc.), can the memory move with me? Currently I use dropbox to keep everything in sync (hands off), I'm not sure dropbox with a live KV cache is a good idea though (would produce plenty of ("conficted copy") files) and wonder if you have a suggestion.

The good news is that I only use one at a time, and they share a VPN network, with that in mind do you guys have a hands-off way to keep these in-sync?

This is wonderful man!! I am using it with claude code and seeing reduced token consumption. This is amazing!

very happy to hear this.

The 92% context compression is the number that actually matters here.. I've run into the token dump problem myself where the context window gets eaten before the agent even starts doing real work. The hybrid BM25+ vector search approach makes sense for this use case too, pure vector search misses exact identifier matches which in a codebase is basically everything important.. function names, variable names, file paths.

One thing I'm curious about.. how are you handling memory staleness? If I refactored a module two weeks ago and the old architecture is still in memory, the agent might confidently suggest patterns that no longer apply. Is there a TTL mechanism or does it rely on the agent to overwrite outdated memories when it encounters contradictions? That's the failure mode I'd be most worried about in a real codebase.

I really like the hybrid (BM25 + Vector Search) approach to this problem. Pure semantic search for exact function names and file paths in large codebases does not work very well. What do you do about memory staleness. ? If a given architecture is refactored, what is preventing the agent from being able to retrieve the outdated pattern in the next session?

Persistent useful memory seems to be key for multiple AI applications not just codding. 2 genuine questions :
How does it scale with intense usage over the course of a year?
How does it handle successive contradictions ?

The problem this solves is real — I've had the exact experience of explaining my project's architecture to an agent, solving a problem together, and then having to explain it all over again the next session like none of it happened. What I'd want to understand before trusting this fully: how does it handle cases where the architecture changes? If I refactor something significantly, does the old memory become a liability that misleads future sessions, or does the system detect drift and update accordingly?