TokenZip

TokenZip - Open protocol for AI agents to share memory, not tokens

by•
TokenZip Protocol reduces AI-to-AI communication bandwidth by 80% and latency by 95%. Open standard for heterogeneous agents. Try the live demo. Test API Base URL: https://tokenzip.org Auth: Authorization: Bearer demo-investor-key How to call it: See the following comment

Add a comment

Replies

Best
TokenZip
Hunter
📌
Yet in the cutting-edge LLM communications of 2026, we are still brute-forcing massive plaintext strings (Pass-by-Value)?We are effectively paying a pointless "idiot tax" for machines to simply re-read things. This is exactly why I drafted the TokenZip Protocol (TZP). It is the first "Universal Semantic Shared Memory Standard" designed for heterogeneous AI agents. Stop passing tens of thousands of tokens of waffle, and start passing pointers of less than 10 characters. How does it work? Through TZP, Agent A can compress massive context into a unified semantic vector space, cache it at edge nodes (essentially building a semantic CDN for AI), and then only pass a short ID to Agent B, for example: [TZP: tx_8f9A2b]. The result? Bandwidth latency plummets, communication efficiency across different models skyrockets, and most importantly—you save up to 90% on Token costs! Big tech might not lose sleep over a rounding error in their server electricity bills, but as a solo developer, I have to rescue my own credit card. The first API MVP and v1.0 draft of TokenZip are now live on GitHub. It's still an early-stage project, but I firmly believe the "pass-by-reference standard for the AI era" must be defined by the open-source community.
Shubham Kukreti

Interesting, It it better than RAG based systems. I mean, even there you can plug same memory across multiple agents. Any specific reason you chose this approach?

TokenZip
Hunter

@shubham_kukreti When you use RAG, the retrieved chunks change slightly every time based on the query. This constantly modifies the prompt prefix, completely destroying OpenAI and Anthropic's native Prompt Caching. Your cache hit rate drops to zero.

TokenZip does the exact opposite. Because we restore the exact same massive text block via the pointer, we force a 100% cache hit rate on the LLM provider's side. RAG makes API bills higher; TokenZip weaponizes native caching to drop bills by 90%.

To use RAG across agents, a developer needs to spin up Pinecone, pick an embedding model, write chunking logic, and manage vector states. It takes a week.

To use TokenZip, they change api.openai.com to gateway.trexapi.com. It takes 15 seconds. We are building a network layer, not a database.

Shubham Kukreti

@tokenzip understood