Edgee

The AI Gateway that TL;DR tokens

221 followers

The AI Gateway that TL;DR tokens

221 followers

Visit website

AI Infrastructure Tools

•

AI Metrics and Evaluation

•

LLM Developer Tools

Edgee compresses prompts before they reach LLM providers and reduces token costs by up to 50%. Same code, fewer tokens, lower bills.

Free Options

Launch tags:Software Engineering•Developer Tools•Artificial Intelligence

Launch Team / Built With

Figr AI: UX Agent for Product Teams — Learns your product. Thinks through UX

Learns your product. Thinks through UX

Promoted

token compression at the gateway level is a smart approach. i've been watching my AI API costs climb across multiple projects and this is exactly the kind of infra that makes shipping AI features viable without stressing about the bill

Report

7d ago

Maker

@cogotemartinez I agree. There are several local solutions that can intercept the context and make it more efficient/economical. However, it is obviously much simpler to use a gateway: it can be installed in two minutes, there are no maintenance issues, and Edgee offers extreme scalability. In short, it is indeed more practical, but I admit I am a little biased ;)

Report

7d ago

This would be game-changing for our margins. Does the compression work for both prompts and completions?

Report

8d ago

Maker

@hajar_lamjadab2 yes it is! And it's even more efficient when the context window becomes larger and larger.

Report

8d ago

Cloudthread

Cool idea! Do you get transparency into how prompt was trimmed/manipulated so you can ensure nothing was missed?

Report

8d ago

Maker

@daniele_packard We have information that allows us to understand how our model performs, yes. However, we do not keep the original prompt for obvious privacy reasons. To control the compressed prompt, we perform a similarity analysis by calculating several metrics (rouge, bert, cosine...). And we allow our users to define a threshold that guarantees semantic similarity.

Report

8d ago

Connectiviteam

Congrats!

Report

8d ago

Maker

@stanmassueras an honour to have your support. At @Edgee, we loooove @ElevenLabs 💪

Report

8d ago

@stanmassueras Thank you! We really appreciate the support 🙏

If you end up giving Edgee a try, we’d love to hear your feedback.

Report

8d ago

Congrats on the launch !

LLM's costs are going crazy here, I definitetly give it a try

Report

8d ago

Maker

You'll be welcome @angezanetti . We decided to build Edgee after talking with 50+ CTOs who started to struggle with token costs. Really exciting challenge, the team is sooo excited!

Report

8d ago

nao

Hey, this is interesting! I was wondering if the prompt optimisations that you're doing are deterministic, as the first layer of cost improvement is caching we having a long conversation with LLM you need to cache, so the prompt compaction need to be deterministic and stable whatever happens.

Second point how do handle different model providers API interfaces? Do you support SSE? Did you reimplemented your own layer between Edgee SDK and LLM providers? There are so many edge cases with each provider when it comes to streaming + tools + reasoning tokens, etc.

Report

8d ago

Maker

@bleff Yes, determinism is critical, especially when caching is involved. Our token compression layer is designed to be stable and reproducible for the same input structure. We don’t rely on generative summarization for compaction. Instead, we focus on structural redundancy removal and relevance prioritization so that the transformation remains deterministic and cache-friendly. You’re absolutely right: if compression isn’t stable, it breaks caching, so that’s a core constraint in our design. On provider interfaces: We maintain our own abstraction layer between the Edgee SDK and model providers. The API is OpenAI and Anthropic compatible outwardly, but internally we normalize differences across providers (streaming formats, tool calls, reasoning tokens, etc.). Yes, we support streaming (including SSE), and streaming is treated as a first-class concern. A lot of the complexity lives exactly where you mentioned: streaming + tools + provider-specific edge cases. Totally agree this space is full of sharp edges, that’s precisely why we think a robust gateway layer is needed. Happy to go deeper on any of these points!

Report

8d ago

Love this! Congrats @sachamorard - Great onboarding XP and managed to get going in <5' we will do ❤️. Curious whether and how we can control the compression level and adjust based on endpoints or use case as I imagine there's a quality trade-off?

Report

8d ago

Maker

@gdecugis Yes, compression isn’t “one size fits all.” Different endpoints and use cases tolerate different levels of optimization. We’re building it so compression can be: - configurable at the org / key / sdk level - safe, adding what we call « Semantic preservation threshold »: If similarity (Bert indicator) is below this threshold, we send the original prompt instead to preserve quality. That’s mathematics ;) - observable The goal isn’t to blindly shrink prompts, but to make the trade-off explicit and controllable. And you’re absolutely right, there can be a quality trade-off in some scenarios, so giving teams control (and visibility) is key. Happy to go deeper if you have a specific use case in mind

Report

7d ago

@sachamorard Super clear. Thanks!

Report

7d ago

1 2 3 4