Launching today
MemoryOS
AI agent memory with knowledge graph and 78ms retrieval
1 follower
AI agent memory with knowledge graph and 78ms retrieval
1 follower
Most AI memory tools either score badly or cost $249/mo. So I built MemoryOS — an open-source, self-hosted memory layer for AI agents. Benchmarks (LongMemEval-s, ICLR 2025): • HydraDB - 90.79%, <200ms - $249/mo • Supermemory - 85.4%, <300ms • MemoryOS - 86.2%, 78ms - free open source Features: * Temporal knowledge graph with point-in-time memory * Hybrid retrieval: vector + BM25 + graph traversal * Human-inspired memory decay * 9ms/msg ingest pipeline * Fully local retrieval hot path


I built MemoryOS — an open-source, self-hosted memory layer for AI agents that is fast, benchmarked, and actually production-ready.
On the latest LongMemEval-s (ICLR 2025) benchmark:
HydraDB — 90.79%, <200ms — closed source, $249/mo
Supermemory — 85.4%, <300ms — open source, $19/mo
MemoryOS — 86.2%, 78ms — open source, free, self-hosted
The goal wasn’t just to build “vector DB + RAG.”
I wanted a memory system that behaves more like human memory: temporal, contextual, and adaptive over time.
How it works
MemoryOS uses an append-only temporal knowledge graph.
Instead of overwriting facts, memories evolve over time. Every relationship is stored as:
(subject, predicate, object, tvalid_start, tvalid_end)
When new information arrives, the previous edge is closed with a timestamp and a new edge is appended. Nothing is deleted.
That means you can query historical state naturally:
“Where does Alice live now?”
“Where did Alice live in 2022?”
Both return different answers from the same graph.
Hybrid retrieval pipeline
Each memory chunk is transformed into three separate representations:
vcontent — dense embedding of the raw text
vlatent — embedding of LLM-enriched text with resolved references and explicit entities
BM25 sparse weights — keyword relevance for exact-match retrieval
At query time, all signals are combined with graph proximity into a unified hybrid score.
This dramatically improves retrieval quality on ambiguous or long-context queries.
Built for low latency
The retrieval hot path runs entirely local:
sentence-transformers/all-MiniLM-L6-v2 for embeddings (~14ms)
pgvector HNSW for ANN search (~20ms)
numpy scoring pipeline
9ms/message batch ingest
The only optional external dependency is Cohere rerank, which adds ~450ms latency but noticeably improves precision for difficult semantic queries.
Human-inspired memory decay
MemoryOS also includes a forgetting engine inspired by the Ebbinghaus forgetting curve:
R = e^(-t/S)
Frequently accessed memories become harder to forget because the stability factor (S) increases with each retrieval.
Unused memories gradually decay and get archived automatically below a configurable threshold.
The result is a memory system that stays relevant without growing infinitely noisy.
Why I built this
Most “AI memory” products today are either:
thin wrappers around vector databases
closed-source APIs with vendor lock-in
systems optimized for demos instead of long-term reasoning
I wanted something developers could fully own, inspect, modify, and deploy themselves.
MemoryOS is:
Open source
Self-hostable
Fast enough for real-time agents
Designed for long-term contextual memory
Benchmark-tested instead of benchmark-marketed
If you’re building AI agents, copilots, autonomous workflows, or persistent assistants, I’d love feedback from the community.