I built MemoryOS — an open-source, self-hosted memory layer for AI agents that is fast, benchmarked, and actually production-ready.

On the latest LongMemEval-s (ICLR 2025) benchmark:

HydraDB — 90.79%, <200ms — closed source, $249/mo
Supermemory — 85.4%, <300ms — open source, $19/mo
MemoryOS — 86.2%, 78ms — open source, free, self-hosted

The goal wasn’t just to build “vector DB + RAG.”

I wanted a memory system that behaves more like human memory: temporal, contextual, and adaptive over time.

How it works

MemoryOS uses an append-only temporal knowledge graph.

Instead of overwriting facts, memories evolve over time. Every relationship is stored as:

(subject, predicate, object, tvalid_start, tvalid_end)

When new information arrives, the previous edge is closed with a timestamp and a new edge is appended. Nothing is deleted.

That means you can query historical state naturally:

“Where does Alice live now?”
“Where did Alice live in 2022?”

Both return different answers from the same graph.

Hybrid retrieval pipeline

Each memory chunk is transformed into three separate representations:

vcontent — dense embedding of the raw text
vlatent — embedding of LLM-enriched text with resolved references and explicit entities
BM25 sparse weights — keyword relevance for exact-match retrieval

At query time, all signals are combined with graph proximity into a unified hybrid score.

This dramatically improves retrieval quality on ambiguous or long-context queries.

Built for low latency

The retrieval hot path runs entirely local:

sentence-transformers/all-MiniLM-L6-v2 for embeddings (~14ms)
pgvector HNSW for ANN search (~20ms)
numpy scoring pipeline
9ms/message batch ingest

The only optional external dependency is Cohere rerank, which adds ~450ms latency but noticeably improves precision for difficult semantic queries.

Human-inspired memory decay

MemoryOS also includes a forgetting engine inspired by the Ebbinghaus forgetting curve:

R = e^(-t/S)

Frequently accessed memories become harder to forget because the stability factor (S) increases with each retrieval.

Unused memories gradually decay and get archived automatically below a configurable threshold.

The result is a memory system that stays relevant without growing infinitely noisy.

Why I built this

Most “AI memory” products today are either:

thin wrappers around vector databases
closed-source APIs with vendor lock-in
systems optimized for demos instead of long-term reasoning

I wanted something developers could fully own, inspect, modify, and deploy themselves.

MemoryOS is:

Open source
Self-hostable
Fast enough for real-time agents
Designed for long-term contextual memory
Benchmark-tested instead of benchmark-marketed

If you’re building AI agents, copilots, autonomous workflows, or persistent assistants, I’d love feedback from the community.