Hey, I'm Sacha, co-founder at @Edgee
Over the last few months, we've been working on a problem we kept seeing in production AI systems:
LLM costs don't scale linearly with usage, they scale with context.
As teams add RAG, tool calls, long chat histories, memory, and guardrails, prompts become huge and token spend quickly becomes the main bottleneck.
So we built a token compression layer designed to run before inference.
Edgee
Hey PH 👋
We're launching the Codex Compressor today.
But first, what is Edgee?
Edgee is an AI Gateway for Coding Agents, and it helps you save tokens. It's really simple to use, you only need two command lines:
A first one to install Edgee CLI with curl or brew
And a simple edgee launch codex
That's it! And it works the same with Claude Code.
The results:
As a gateway, Edgee can optimize the requests that are sent to OpenAI, remove noise and waste, and cut input token usage almost in half.
We ran a controlled benchmark (see the video): same repo, same model (gpt-5.4), same task sequence.
One session with plain Codex, one with Codex routed through Edgee.
Input tokens: −49.5%
Total cost: −35.6%
Cache hit rate: from 76.1% to 85.4%
The cache hit rate improvement is the part I find most interesting. By sending leaner prompts, @OpenAI cache is hit more often, so the savings compound beyond just the compression ratio.
Here's what makes this different from other token compression tools: we pull token counts directly from the OpenAI API usage fields. No character-based estimates. The numbers are what you're actually billed for.
⭐️ Please, give a star to our brand new OSS repository, we need support ;)
And don't hesitate to try, it's free!
Happy to answer any questions here all day. 🙏
@sachamorard s/o for this new launch -- keep up the great work
Coolest launch of the day!! Btw what kinds of transformations are you applying like semantic compression, deduplication, summarization or something else???
Edgee
@lak7 We're doing token-level compression, not semantic. Concretely, we clean the tool results: smart filtering (strip ANSI codes, progress bars, whitespace noise), deduplication (collapse repeated log lines with counts), grouping (aggregate similar items), and truncation (keep the signal, cut the redundancy).
No summarization, no embedding-space compression. The approach stays fully transparent and deterministic, what gets sent to the model is readable and debuggable, just leaner.
The biggest gains come from tool outputs like cargo build, git log, go test... designed for humans, not for models. That's where the −93% on cargo comes from.
100%, @Edgee is underrated IMHO
That's a very specific figure — I like that. Is that an average across user sessions or a median? And what does the distribution look like — are most users clustered around that number or is it more bimodal between light and heavy usage?
Edgee
@ryanwmcc1 Great question, and I want to be upfront: this is a single benchmark run, not an aggregate across user sessions. The −49.5% figure comes from one controlled test, same repo, same model, same task sequence, so it's a point measurement rather than a statistical distribution.
That said, the compression ratio in our architecture isn't random. It tracks directly with how much redundant context accumulates in a session, which is a function of session length, tool call frequency, and how repetitive the tool outputs are. Cargo build output, for example, is extremely compressible (−93% in this run) because it's verbose and structurally repetitive. File reads are less so (−34%).
If we look at the average compression rate for each Codex session, we're more around -40% of input tokens, as it depends heavily on how the developer uses Codex.
35.6% is an oddly specific number, which makes me trust it more than "save up to 50%." What's actually being compressed, prompt-side context pruning, response caching, or something closer to semantic dedup across a session? Asking because I've been eyeing my own API bill lately and the honest breakdown matters.
Very cool product !
I'm using it for 3 weeks and it's very efficient. A game changer to controle API costs.
Edgee
Typeform
Edgee
@picsoung haha, thanks. I would do anything to help friends save tokens.
@picsoung @sachamorard "friends don't let friends waste tokens."
UXPin Merge
Really interesting concept. Token compression plus routing in one layer feels powerful. How do you decide what gets compressed without affecting output quality?