
Edgee
The AI Gateway that TL;DR tokens
221 followers
The AI Gateway that TL;DR tokens
221 followers
Edgee compresses prompts before they reach LLM providers and reduces token costs by up to 50%. Same code, fewer tokens, lower bills.





Free Options
Launch Team / Built With




Product Hunt
@curiouskitty Great question, and totally valid concern!
We're edge-native, so we avoid adding a centralized bottleneck and keep network hops minimal. Edgee is running on more than 100 points of presence around the world, on more than 10k servers, and we already process 3B+ requests a month ;)
Streaming is first-class, and pre-inference workloads run before the model call, so they don't block token streaming.
On reliability: we don't do blind retries. Routing is health-aware, with bounded retries, circuit-breaker behavior, and dynamic deprioritization during brownouts to avoid traffic amplification.
To summarize, Edgee will be to AI what CDNs were to the web.
Happy to go deeper if helpful!
nao
Hey, this is interesting! I was wondering if the prompt optimisations that you're doing are deterministic, as the first layer of cost improvement is caching we having a long conversation with LLM you need to cache, so the prompt compaction need to be deterministic and stable whatever happens.
Second point how do handle different model providers API interfaces? Do you support SSE? Did you reimplemented your own layer between Edgee SDK and LLM providers? There are so many edge cases with each provider when it comes to streaming + tools + reasoning tokens, etc.
Batch
@sachamorard Token costs are definitely becoming a real problem once prompts get large (RAG, tools, agents…).
Curious how you handle compression without breaking output quality, especially for structured outputs?
@sachamorard @virtualgoodz Yeah alignement is a big issue when doing any prompt transformation !
In general, tracking performance across a mix of semantic preservation metrics like bert, cosine, rouge making sure that they don't degrade below a certain threshold is good proxy.
For structured output, things are trickier, as the compression shouldn't be "generative", in the sense of re-expressing with other tokens, so it's more deterministic through a more compact re-encoding of the structure, through crushing, factorizing repetitions and so on !
Glad to discuss this further if needs be :D
Plezi
Congrats on the launch!
We're stuck on how to attribute LLM costs back to specific features. Does Edgee tag requests so we can track cost per feature?
Hello @benoit_collet, thanks for the interest !
Good question to ask, it is quite a pain we've experienced where cost was only analyzable by API key which could be painful as you might not want to have 50 different keys just for the purpose of cost categorization.
We've created the "tags" feature which allow you (via API headers or via our SDKs) to automatically define categories. Tags will be visible in your analytics dashboard to allow you to understand exactly where you are spending the most !
You can learn more on our documentation : https://www.edgee.ai/docs/integrations/langchain#tags
The documentation I've sent is part of our Langchain SDK
This doc enters more in depth into what tags really are
Typeform
As an indiehacker, I am always afraid of receiving an expensive bill because my AI feature suddenly saw a lot of usage. Anything that can help reduce costs and give me insights into what's going on, is welcome.
It's no brainer to use it from day 1, and see value right away.
Congrats @sachamorard team for building this💪
Thanks a lot @picsoung for the support 🙌
And totally agree! That "unexpected AI bill" fear is real, especially for indie hackers and small teams where one spike can ruin the month 😅
That's exactly why we built Edgee: so you can get cost visibility + optimizations (like token compression) from day one, before things get out of control.
Really appreciate you hunting and sharing this. Excited to hear what you build with it! 🚀
@sachamorard @picsoung We've heard this from pretty much every CTO and CEO we've talked to in Europe and the US. The end-of-month bill can be a real shock! 💸
Go edgee! Would love to know if you handle MCP and Tool usage optimisations? It's a real pain for long running agents
Hey @marek_kalnik ! We don't manage MCPs for now, but we have developed Edge Tools.
These are tools executed at the gateway level, before or after the call to the model. They can be verifications, transformations, enrichments, controls.... memory access!
Token costs are the new database query problem. This feels like the right abstraction layer.
How's the latency impact in practice?