Launching today

Sipcode
Keep Claude Code's context clean for sharper answers
67 followers
Keep Claude Code's context clean for sharper answers
67 followers
Context hygiene for Claude Code. Caps verbose tool output and dedupes same-session re-reads so the model sees signal, not noise. Anthropic measures 29% quality lift from cleaner context. Proof: 62.6% median tool-output savings on a locked 20-task benchmark. MIT.








Sipcode
Foyer
The context window management problem in Claude Code is real. Long sessions accumulate dead weight fast, old tool outputs, abandoned approaches, redundant file reads, and once the context gets bloated the model starts hedging more and the answers get muddier. Curious whether Sipcode is doing something principled to decide what to prune (like deprioritizing failed attempts or stale file state) or whether it's more of a manual curation layer where you're telling it what to keep. Also wondering if there's any handling for cases where something that looked like a dead end earlier in the session turns out to be relevant again.
Sipcode
@fberrez1 Florent, sharp question. The distinction you are drawing is real.
Honest answer: Sipcode operates at the mechanical layer, not the semantic one. It does NOT currently decide "this approach was abandoned" or "this file is stale." That kind of semantic curation needs an LLM in the loop (kills the privacy story) or a structured intent trace (research territory).
What Sipcode does today:
1. Reads: dedup by file path + content hash. If Claude already read it and disk has not changed, the re-Read short-circuits. Original content stays in context.
2. Verbose tool output (git log, npm install, grep, find): cap volume via parameter injection. Static rules, not semantic.
On your dead-end-becomes-relevant-again case: Sipcode does not remove what is already in context. It catches DUPLICATIVE reads only. If something seemed irrelevant earlier and matters now, Claude still has the original bytes and can re-engage.
The real edge case: if Sipcode caps a verbose output (grep at 100 results) and result #500 was the one you needed. That is a failure mode. Every rewriter declares an integrity score on each fire so over-stripping is visible in sipcode why.
Semantic curation (deprioritize failed attempts, drop stale state) is the right next layer. Honest pre-commitment: it requires an architecture I have not figured out yet, or a privacy compromise I am not willing to make. Thinking on it.
Context bloat is my #1 frustration with Claude Code in long sessions. You watch it re-read the same files and re-print npm install walls of text and by the end of a complex session the answers are noticeably worse. The 40% agent error reduction stat is the one that got my attention - quality lift is nice but errors are the thing that actually breaks workflows. The PreToolUse hook approach is smart because it intercepts before the context gets polluted rather than trying to clean up after. Installing this today. Does it handle situations where Claude Code genuinely needs to re-read a file because it changed, or does it dedupe those too?
Sipcode
@galdayan Thanks Gal, that 40% number is exactly why I lean on it over the quality lift in the copy.
To your question: no, changed files are never deduped. On every potential dedup hit, the proxy compares cached bytes against current disk bytes after LF and BOM canonicalization. If they differ by even one byte, the read goes through untouched. The cost is one stat + hash per re-read, the benefit is I never feed Claude stale content. Designed it that way because a wrong dedup is worse than no dedup at all.
Tendem by Toloka
Hey, congrats!
A couple of questions.
Have you measured the quality performance somehow? I mean, the speed/quality on certain tasks.
Also - is it configurable be Claude to "disable" it if needed, if it things that the hook over-stripped the content?
Thanks!
Sipcode
@perrymason Hey Viacheslav, thanks for the early look and the real questions.
On quality measurement: no controlled A/B on real user tasks yet. What I measure directly is per-rewriter signal kept (every rewriter declares an integrity score on each fire), tool-output savings on a locked 20-task benchmark (62.6% median, range 37.4% to 80.6%, reproducible via sipcode benchmark from the repo), and per-session proxy stats.
The 29% quality lift number is Anthropic's published research, not mine. I am careful not to claim Sipcode users specifically see 29%. The gap between "context got cleaner" (measurable) and "answers got better by X%" (requires controlled experiments) is real and I would rather flag it than oversell.
On configurability, three layers:
Per-tool-call: if Claude passes an explicit parameter (head_limit on Grep, count output mode, explicit offset on Read), the relevant rewriter detects the user-supplied value and steps aside. Claude can effectively opt out of compression for a specific call by being explicit. Rewriters skip rather than fight.
Per-rewriter selective disable via env var or config: not shipped yet. Honest gap. Today a user who hits over-stripping either passes an explicit param on that call or removes the proxy entirely via sipcode proxy --uninstall.
Per-session bypass triggered from inside the agent: also not shipped. Your specific scenario, where Claude itself decides "this hook over-stripped, back off for now", is a really good design idea I have not built. The per-fire integrity scores are there, so the data exists. Wiring it to an agent-side self-modulation primitive is something I want to think about for v1.7.
Tendem by Toloka
@axlerodd Thanks, glad that you think on improving the product)
Regarding quality, I think it could be not even about boosting it, but rather sustaining the similar quality level (drop of 5-10% can be acceptable) if tokens usage falls 50/60/70%. However, this one requires actual thorough benchmarking, as the performance on different tasks may differ as well. And that's something many compression products lack... Glad that you don't hide that)
Sipcode
@perrymason Viacheslav, that "sustaining" framing is sharper than where I was. Honest base case: same quality, lower tokens. Anything else is gravy.
Real benchmark needs a corpus someone else picks (not mine, since curated by me = optimized by me), agent-eval style with verifiable outcomes. Toggle sipcode on/off, measure agreement rate or judged quality.
I have not built it yet. Public commitment that I should is on my list. If you have seen any agent-eval frameworks that handle seed/temperature non-determinism well, I would love a pointer.
Easier to flag the gap than ship a slogan I cannot back up.
Congrats on the launch! Keeping Claude Code context clean is a very real pain point for anyone building with AI coding tools. I like the focus on sharper answers instead of just longer context. How are you deciding what should stay in context versus what should be summarized or dropped?
Sipcode
@rahulbhavsar Thanks Rahul. The rule is intentionally boring: I never summarize and I never drop anything model-facing. I only rewrite where I can prove the rewrite preserves every fact Claude could realistically need next.
So for Bash output I cap volume (head_limit on grep, truncating npm install walls). For Read I dedup byte-identical re-reads inside the same session, with a hash check against current disk so changed files always pass through. Each rewriter declares a 0-1 integrity score so the savings number is never decoupled from how lossy the rewrite is.
Semantic summarization and importance ranking are higher-leverage and I have research on both, but neither clears the bar I've set for shipping into someone else's session. Lossless first, lossy never.
Are you hitting a case where you wish it dropped more aggressively, or kept more?
Claude Code users know how quickly context gets polluted with logs, repetitive outputs, and tool noise 😅 The idea of treating context as a limited resource rather than an infinite one really resonates. Curious... what was the most surprising source of context bloat you discovered while building Sipcode?
Sipcode
@harini_mukesh Thanks Harini. Honest answer: it was not the verbose tool output, even though that is the biggest absolute number. It was watching Claude re-read the same file three times in a single task because each tool call thinks it is starting fresh.
I built a quick counter expecting maybe 5-10% of reads to be duplicates. The real number on a 4-hour refactor session was 38%. More than a third of every Read was the model looking at bytes it had literally just seen. Not a model failure, a memory architecture failure, the agent does not have a cheap way to remember "I already loaded this", so it just re-fetches and pays the token cost.
The surprising part is that this is invisible. You feel like the session is slowing down, you blame the model, you switch to a smaller context. You do not realize you are paying 800 tokens to re-read the same file Claude saw 90 seconds ago.
The npm install walls and tsc dumps are the obvious wins. The re-read pattern is the one that quietly eats half your context window before you notice.
What does your context-pollution profile look like when you actually measure it?
Sipcode
@divvsaxena Divv, thanks. Fair point on the voiceover. Went visual-only because most X/LinkedIn previews play muted, but you are right that it would carry more.
A voiceover cut with the dogfood story narrated is a really good post-launch follow-up. Adding to the list. Appreciate the eye.