TokenSaver is a self-hosted caching proxy that sits between your app and the LLM API. It deduplicates prompts using SHA256 for exact matches and sentence-transformers embeddings for near-identical queries. Catch retries, refreshes, and the same question from different users before they reach the API. Includes per-user budgets, Slack/email alerts, rate limiting, per-model cost analytics, and anomaly detection. Free for up to 5 users (MIT).