Announcement: We're launching Memanto on Product Hunt β June 16
Hi everyone, Majid here (co-founder & CTO). An announcement for the Moorcheh community first, because you've been with us longest.
On Tuesday, June 16, we're launching Memanto on Product Hunt our open-source memory agent that gives Claude Code, Cursor, Codex, and a dozen other AI agents persistent memory across sessions. It's the first product built end-to-end on Moorcheh's retrieval engine, and many of you have already seen pieces of it in the repo and on our YouTube channel.
What makes this launch different from what we've shipped before:
The entire stack now runs locally, for free. One pip install, and Memanto sets up Moorcheh on-prem, embedding models, and a local LLM via Ollama on your own machine. No API key, no data leaving your laptop. The free cloud option is still there, and you can switch between the two.
Where $2.5 Million a Year Actually Goes
When most people think about the cost of AI search, they think about the vector database. But the database is just the tip of the iceberg. Here's what the full cost stack actually looks like for a typical enterprise running retrieval-augmented generation (RAG) at scale:
The vector database cluster is the obvious one. To serve 150 multi-tenant customers with real-time retrieval, this customer was running Qdrant on a fleet of AWS memory-optimized instances (r7g.2xlarge) plus Kubernetes orchestration. Annual cost: ~$591,000. And that infrastructure runs 24/7, whether it's peak hours or 3 AM on a Sunday. You're renting RAM by the year to hold vectors that might get queried once an hour.
The reranking API is the cost nobody budgets for. Traditional vector databases use approximate search they give you "close enough" results using a probabilistic algorithm called HNSW. For enterprise use cases in regulated industries, "close enough" isn't good enough. So teams bolt on a reranking service like Cohere Rerank to improve accuracy after the initial retrieval. That API call on every query, at this volume, costs roughly ~$1.5 million per year. It's the single biggest line item, and most teams don't see it coming until they're already in production.
The middleware and observability layer adds another surprise. Enterprise RAG requires auditability you need to trace exactly which documents were retrieved, with what parameters, through what logic. Teams typically bolt on LangSmith or similar observability tooling on top of LangChain, which adds token overhead and tracing costs. For this customer: ~$378,000 per year.
