Context Gateway

Make Claude Code faster and cheaper without losing context

274 followers

Make Claude Code faster and cheaper without losing context

274 followers

Visit website

AI Infrastructure Tools

•

LLM Developer Tools

Context Gateway cuts latency and token spend for Claude Code / Codex / OpenClaw by compressing tool output while preserving important context. Setup takes less than a minute. Quality-of-life features: instant context compaction and setting spend limit in Claude Code.

Free

Launch tags:Developer Tools•Artificial Intelligence•GitHub

Launch Team

Wispr Flow: Dictation That Works EverywhereStop typing. Start speaking. 4x faster.

Promoted

Context Gateway

Maker

📌

Hey all! 👋 We are releasing Context Gateway - our context compression proxy, which cuts token spend and improves accuracy/latency for Claude Code, OpenClaw, Codex, and other agents. We built it because agents struggle to efficiently manage lengthy context: each tool an agent calls can return thousands of redundant tokens, leading to unnecessary spend, higher latency, and lower generation quality. The proxy invisibly fixes that by compressing whatever context the agent has to deal with. We've also added a number of quality-of-life features, which are missing from Claude Code: instant context compaction (same /compact, but you don't wait for 3 minutes), setting the spend cap, sending Slack notifications, and more. We are open-sourcing everything except for the models we use for context compression, which are free to use during the launch. Excited to hear your feedback and the features you'd want next! 🚀

Report

4mo ago

The spend cap and Slack notifications are almost more valuable than the compression itself. Running Claude Code on a large codebase without any spending guardrails is genuinely stressful. You check back after 20 minutes and it's burned through $40 on a rabbit hole.

Is the compression lossy in practice? I've seen context window summaries drop important details (like specific variable names or error messages) that then cause the agent to hallucinate fixes. How do you handle preserving the details that actually matter vs. trimming the boilerplate?

Report

4mo ago

Context Gateway

Maker

@whatworkedforme hey Jack, thanks! The way compression works is very different from summarization: we preserve the structure of each tool output, but remove the "irrelevant" tokens. We condition compression on the user's query, making sure that useful info is kept, while the boilerplate is trimmed. So, in practice, we see that the quality improves. We are also running comprehensive benchmarking right now

Would be awesome if you could give context gateway a shot and share your feedback!

Report

4mo ago

Told

The token compression angle is the right problem to attack — once devs start hitting context limits mid-session, the cognitive cost of managing that manually kills flow. Curious how the compression handles cases where the 'noise' in tool output turns out to be context a later step actually needed — that edge case is where these systems tend to break trust with developers. The Claude Code integration is smart timing given how fast that tool's adoption is moving right now. Would be interested to see how much latency reduction looks like in practice on a typical 30-minute coding session.

Report

4mo ago

Context Gateway

Maker

Hey@jscanzi , great question - the edge cases are a concern! As of now, the model would likely just call the same tool again to get the necessary info. We are currently working on a feature, allowing the model to "pull" one of the previously compressed outputs in full on-demand. As of the latency, in this particular example we saw 30% speed-up (https://youtu.be/idmbFE6L5HU?si=Mcha2h-BYZ2z4gpk). We are working on the more comprehensive benchmarking now

Report

4mo ago

Told

@ivanzak thanks a lot for your reply!

Report

4mo ago

BrandingStudio.ai

Congrats on the launch! Curious how the compression handles tool outputs that contain mixed content, structured data alongside verbose logs, for example. Does it preserve the structured parts reliably while trimming the noise, or is it more of a blunt summarization?

Report

4mo ago

Context Gateway

Maker

@joao_seabra Thanks for the question!

Right now we don’t explicitly differentiate between structured and unstructured data and the compression runs across the tool outputs as they are. Even with that simple approach we’re seeing pretty significant gains in accuracy and reduction of cost and latency.

That being said, you’re touching on something we’re actively working on. Our next major update will start treating structured and unstructured parts differently, so we can treat things like JSON/schema fields atomically while being more aggressive with verbose logs.

Expect improvements here soon.

Report

4mo ago

Really smart approach to a problem I hit constantly - agent tool calls returning massive outputs that bloat context and burn tokens. The instant compaction feature is clutch too, waiting 3 min for /compact in Claude Code always kills my flow. Curious how the compression models handle code-heavy outputs vs prose - do you see different compression ratios?

Report

4mo ago

Context Gateway

Maker

Hey @emad_ibrahim , thank you! The compression ratio is currently fixed at 0.5 - we'll make it auto-tunable in the future to account for varying "density" of different inputs, but, empirically, we see that it already works fine!

Report

4mo ago

So to cut token spend in Claude Code you actually spend more tokens on a summarizer model? And this model will summarize your content on it's will, actually running the risk of cutting important information?

Could the summarization be done with a local model instead?

Report

4mo ago

Context Gateway

Maker

Hey@daniel_sitnik , we don't spend more tokens because our compression model doesn't generate anything. It acts as a classifier, keeping some tokens and removing others. It is cheap, fast, and works well. We don't remove important info, because we always condition compression on the user's request, making sure we keep information relevant to it

Right now we are running compression on our side, but in principle it can be done on-prem - happy to chat more about that

Report

4mo ago

Copus

Context compression is one of those problems that becomes critical as agentic workflows scale. Running Claude Code on large codebases eats through tokens fast, so a proxy that intelligently compacts history while preserving the important bits is genuinely useful. The fact that it works across Claude Code, OpenClaw, Codex, and other agents makes it a nice universal layer. Curious how it handles preserving semantic meaning during compression - does it use summarization or more of a selective pruning approach? Great launch, and the open-source angle makes it easy to try.

Report

4mo ago

1 2

Reviews