GitHub - Cut LLM token costs 30–60% — local, instant, zero API calls

TokenShrink compresses LLM prompts before they hit the API — stripping filler, hedges, and bloat without changing meaning. 100+ regex rules run in under 2ms. No API calls. No cost. Never touches code blocks or URLs. Works as CLI, MCP server, and browser extension. English + Turkish. Optional Groq upgrade for LLM-quality compression (14,400 free req/day). Opus 4 costs $15/M tokens — a heavy user spends ~$68/yr on input. TokenShrink cuts that in half. For teams of 10, that's $680/yr saved.

I built TokenShrink after staring at my API bill one morning and realizing half my tokens were "I was wondering if you could possibly help me understand..." The fix seemed obvious: strip the fat before the message ever leaves your machine. No roundtrips to an LLM, no latency, no cost — just regex that runs in 2ms. What surprised me was how much filler survives editing. Even technical writers send 40% fluff. Turkish support came next because that's my first language, and Turkish has its own set of politeness openers that bloat prompts even more. The MCP integration was the unlock — now Claude Desktop silently compresses every message before it's sent. Users don't notice. They just see a smaller bill. Would love feedback on the rule set — what filler patterns do you keep seeing that I haven't caught yet?

GitHub - Cut LLM token costs 30–60% — local, instant, zero API calls

Replies