Launched this week

Sipcode
Keep Claude Code's context clean for sharper answers
168 followers
Keep Claude Code's context clean for sharper answers
168 followers
Context hygiene for Claude Code. Caps verbose tool output and dedupes same-session re-reads so the model sees signal, not noise. Anthropic measures 29% quality lift from cleaner context. Proof: 62.6% median tool-output savings on a locked 20-task benchmark. MIT.








Tendem by Toloka
Hey, congrats!
A couple of questions.
Have you measured the quality performance somehow? I mean, the speed/quality on certain tasks.
Also - is it configurable be Claude to "disable" it if needed, if it things that the hook over-stripped the content?
Thanks!
Sipcode
@perrymason Hey Viacheslav, thanks for the early look and the real questions.
On quality measurement: no controlled A/B on real user tasks yet. What I measure directly is per-rewriter signal kept (every rewriter declares an integrity score on each fire), tool-output savings on a locked 20-task benchmark (62.6% median, range 37.4% to 80.6%, reproducible via sipcode benchmark from the repo), and per-session proxy stats.
The 29% quality lift number is Anthropic's published research, not mine. I am careful not to claim Sipcode users specifically see 29%. The gap between "context got cleaner" (measurable) and "answers got better by X%" (requires controlled experiments) is real and I would rather flag it than oversell.
On configurability, three layers:
Per-tool-call: if Claude passes an explicit parameter (head_limit on Grep, count output mode, explicit offset on Read), the relevant rewriter detects the user-supplied value and steps aside. Claude can effectively opt out of compression for a specific call by being explicit. Rewriters skip rather than fight.
Per-rewriter selective disable via env var or config: not shipped yet. Honest gap. Today a user who hits over-stripping either passes an explicit param on that call or removes the proxy entirely via sipcode proxy --uninstall.
Per-session bypass triggered from inside the agent: also not shipped. Your specific scenario, where Claude itself decides "this hook over-stripped, back off for now", is a really good design idea I have not built. The per-fire integrity scores are there, so the data exists. Wiring it to an agent-side self-modulation primitive is something I want to think about for v1.7.
Tendem by Toloka
@axlerodd Thanks, glad that you think on improving the product)
Regarding quality, I think it could be not even about boosting it, but rather sustaining the similar quality level (drop of 5-10% can be acceptable) if tokens usage falls 50/60/70%. However, this one requires actual thorough benchmarking, as the performance on different tasks may differ as well. And that's something many compression products lack... Glad that you don't hide that)
Sipcode
@perrymason Viacheslav, that "sustaining" framing is sharper than where I was. Honest base case: same quality, lower tokens. Anything else is gravy.
Real benchmark needs a corpus someone else picks (not mine, since curated by me = optimized by me), agent-eval style with verifiable outcomes. Toggle sipcode on/off, measure agreement rate or judged quality.
I have not built it yet. Public commitment that I should is on my list. If you have seen any agent-eval frameworks that handle seed/temperature non-determinism well, I would love a pointer.
Easier to flag the gap than ship a slogan I cannot back up.
Congrats on the launch! Keeping Claude Code context clean is a very real pain point for anyone building with AI coding tools. I like the focus on sharper answers instead of just longer context. How are you deciding what should stay in context versus what should be summarized or dropped?
Sipcode
@rahulbhavsar Thanks Rahul. The rule is intentionally boring: I never summarize and I never drop anything model-facing. I only rewrite where I can prove the rewrite preserves every fact Claude could realistically need next.
So for Bash output I cap volume (head_limit on grep, truncating npm install walls). For Read I dedup byte-identical re-reads inside the same session, with a hash check against current disk so changed files always pass through. Each rewriter declares a 0-1 integrity score so the savings number is never decoupled from how lossy the rewrite is.
Semantic summarization and importance ranking are higher-leverage and I have research on both, but neither clears the bar I've set for shipping into someone else's session. Lossless first, lossy never.
Are you hitting a case where you wish it dropped more aggressively, or kept more?
Loomal
Clever approach to context hygiene — deduping same-session re-reads is the kind of thing that seems obvious in hindsight but nobody built it. Curious whether the 62% savings hold up on projects with lots of large files, or does it vary a lot by codebase structure?
Sipcode
@dannyheng Great question, and the honest answer is yes, it varies a lot by codebase. The 62.6% is the median on a locked 20-task corpus; the published range is 37.4% to 80.6%, and that spread is almost entirely structure.
Two things drive the savings: capping verbose tool output, and deduping same-session re-reads. Large-file projects usually land toward the higher end, not the lower. Re-reading a 2,000-line file six times is six times the tokens, and dedup collapses that to one, so the more re-read-heavy the work, the more there is to recover.
What pulls the number down: repos where Claude reads each file once and the output is already compact. Less to cap, less to dedup.
So I'd never pitch a flat 62% for everyone. sipcode benchmark gives the reproducible corpus figure, and sipcode proxy --stats shows your real per-session savings on your own code. I'd genuinely like to hear what your repo reports if you run it.
Watching Claude Code re-read the same file six times and burn through its own context has become a daily annoyance, so putting a cap on that is genuinely useful. The 62% reduction in tool-output usage is wild. Congrats on the launch, Anuj!
Sipcode
@eitan_elnekave Thanks Eitan, and that "6 times" number is exactly the one that made me build this. It is invisible until you measure it, you just feel the session getting heavier and blame the model.
On the 62%: it is a median over a locked 20-task benchmark in the repo, not a peak, so you can run sipcode benchmark and get your own number on your own workload. Yours will differ, but the duplicate-read waste is structural, so most people see a real cut.
Appreciate you engaging both here and on LinkedIn. When your launch goes live, send it my way.
Sipcode
Closing out my launch day soon, and I want to say thank you while it is still live.
The best part was never going to be the ranking, it was the comments. Valeria, David, and Art went straight at the dedup internals, the exact edge cases I obsess over: content-hash versus mtime, subagent context branches, LF and BOM canonicalization, uncapped output on demand. Those are the questions I most wanted someone to ask.
Gal, Harini, Rahul, and Jostin named the problem better than my own copy did, context bloat quietly poisoning a long session and a bigger window not fixing it. Florent and Viacheslav, thank you for engaging early. Patrick and Divv, thank you for the kind words on the build and the video, and for the honest feedback I am already acting on.
I built this solo because I was burning through Claude Code usage and the bill was my only feedback loop. Having people who feel the same problem show up today meant more than the number next to the name.
I am reading every comment for the rest of the day. Keep them coming.
Claude Code rereading the same files and dumping huge logs into context is painfully familiar. I like that Sipcode tackles the boring cleanup layer instead of pretending a bigger context window fixes everything.
Sipcode
@jostin_trunerg Thanks Jostin, that is exactly the bet. A bigger context window just means more room to make the same mess. The boring cleanup layer is unglamorous but it is where the actual reliability lives. Appreciate you seeing the angle.
Congrats on the launch, Anuj! As someone who lives in Claude Code all day (I've built a whole stack of custom skills around it), context bloat from verbose tool output is a very real pain, so a tool that caps it and dedupes same-session re-reads is solving something I actually feel.
Also have to call out the launch video. It's genuinely one of the best I've seen on PH, and the music gave me a wave of RPG nostalgia. Already chatted with you about it. Starring this and trying it on my setup today. 🌟
Sipcode
@patrickaitrapp This means a lot, Patrick, thank you. Someone who lives in Claude Code all day and has built their own stack of skills is exactly who I built this for, so hearing the context-bloat pain land with you is the best signal I could get today.
And thank you on the video. The RPG-nostalgia read is the exact mood I was chasing, so it landing that way made my day. Would genuinely love to hear how it runs on your setup once you have tried it, your stack sounds like a real stress test.