TwoTrim - Cut LLM API costs by 65%. No GPU. No code changes.
byā¢
TwoTrim ā The Mathematical Prompt Compression Fabric for LLM APIs. Cut up to 65% of your AI token costs without losing accuracy.
Replies
Best
Hunter
š
TwoTrim is an open-source prompt compression middleware for LLM applications.
It sits between your app and any LLM API ā OpenAI, Anthropic, or any OpenAI-compatible endpoint ā and removes the tokens your model doesn't need
before the request is sent. Your code doesn't change. Your costs do.
What it does:
ā Strips filler words, redundant sentences, and formatting noise (lossless)
ā Semantic sentence scoring + Lost-in-the-Middle reordering (balanced)
ā BART summarization for long documents (aggressive)
ā FAISS semantic cache ā works on similar queries, not just identical ones
What makes it different:
ā CPU-only. No GPU infrastructure required.
ā Zero refactoring ā drop-in base_url swap for any OpenAI-compatible client
ā Works across providers via LiteLLM, vLLM, and more
ā Honest benchmarks. The results where it fails are published too.
Works best on: document summarization, long-context tasks, and high-volume chatbot/support systems with repeated queries.
Does not work well on: extreme multi-hop RAG at aggressive compression.
Full benchmark data is public in the repo.
Open source. Apache 2.0. Free forever.
github.com/overseek944/twotrim
Replies