PII guard for Claude Code to keep client data out of context

noirdoc - PII guard for Claude Code to keep client data out of context

by•3mo ago

Open-source plugin that redacts PII before Claude Code reads it. Names, emails, and IBANs become placeholders Claude sees; you reveal originals locally. Same engine ships as a CLI for any LLM and a hosted proxy for OpenAI/Anthropic/Azure.

Replies

Best

Maker

📌

Hey — I built noirdoc because I kept seeing the same pattern on consulting engagements: a team wants to use Claude on a real customer ticket, and Legal kills it because the prompt would leak names, IBANs, and addresses to a third party. The plugin is a PreToolUse hook for Claude Code. When Claude tries to read a file from a path you mark sensitive, the plugin redacts it first — names become `<>`, IBANs become `<>` — Claude works on the placeholders, and you reveal the originals locally in your own terminal. Real names never enter Claude's transcript. Detection runs locally — Presidio + Flair + GLiNER ensemble, German-tuned (it handles "Müller, Anna" and lowercase legal text where most tools fail). Reversible, with consistent placeholders across sessions. Same engine ships three ways: this plugin (OSS), a CLI on PyPI for any LLM workflow (OSS), and a hosted proxy that does the same thing transparently for OpenAI / Anthropic / Azure calls if your team needs that. Curious what you'd want it to redact next — what's the data you can't paste into an LLM today?

Report

3mo ago

Privacy isn’t optional anymore. This gets it

Report

3mo ago

voice→form widgets are the case i kept hitting — transcribed names, addresses, phone numbers flowing straight into prompts before anyone notices. the moment you try to use real customer voice samples, Legal blocks the whole pipeline.

the German-tuned detection is the part that actually matters — most tools fail on lowercase legal text and "Müller, Anna" patterns, and false positives there kill adoption faster than missed PII.

Report

3mo ago

The local reveal step is the part that makes this feel usable in real consulting work. Redaction alone is never enough if the team cannot safely get back to the real values before shipping. Curious how the detector handles domain specific identifiers that are not classic PII.

Report

3mo ago

Maker

@stefansamne Today the detector ships with a fixed set of types (names, emails, IBANs, German tax IDs, addresses). So for client codenames, internal project IDs, contract references — it doesn't catch them today.

But we could expose GLINER's zero-shot interface so that it can be extended with custom labels! Definitely something we will put on to the roadmap!

Report

3mo ago

apart from finance information is it able to detect some credentials been passed accidentaly?

Report

3mo ago

Maker

@zabbar Not yet, but we have it on the roadmap!

Report

3mo ago

This solves a real pain point. I work at a fintech and we hit the same wall constantly: engineers want to use Claude Code on production support tickets but the moment there's a customer IBAN or address in the file, compliance shuts it down. The PreToolUse hook approach is smart because it catches things before they ever leave your machine. Curious about the performance overhead on large files (e.g. 10K+ line logs with scattered PII). Does the Presidio+Flair ensemble add noticeable latency, or is it fast enough to feel invisible during normal Claude Code usage?

Report

3mo ago

Maker

@elijahbowlby Honest answer: it will not be invisible on 10K-line files with the full pipeline. Flair's the bottleneck here. For huge logs you can drop Flair and run Presidio + spaCy only — ~10× faster, you lose some German name recall but the regex layer still catches IBANs and emails (which is the actual fintech failure mode).

The hook is async, so Claude Code shows a "redacting…" state instead of blocking. But yeah — for large files, you will feel the latency!

Report

3mo ago

How do you detect names? A dictionary? An LLM or model of some sort?

Report

3mo ago

Maker

@robert_douglass No a dictionary would be too brittle, it uses a combination of tools.

Presidio handles regex-based PII (IBAN, email, tax IDs). For names, we use three NER models, all local:

The NER component of spaCy's de_core_news_lg pipeline (called via Presidio)
Flair's de-ner-large (dedicated NER model, separate pass — catches "Schmidt, Lisa" comma-form and lowercase legal text)
GLiNER (zero-shot — add custom entity types at runtime without retraining)

Each NER fails differently, so the three vote together, because the union has better recall than any single one.

Report

3mo ago

@antonio_maiolo don't you find the NER to be quite slow (and large download) for local? I had a hard time getting the accuracy / time / size equation right.

Report

3mo ago

This is pretty smart. I can imagine a lot of corporations needing this tool, including my own consulting firm.

Are you able to create your own dictionary as well? Can this be deployed at the enterprise level?

Report

3mo ago

Maker

@amoakk At the moment no, but definitely an interesting idea which I will add to the roadmap!

We offer a separate product which is an API Proxy covering all OpenAI compatible APIs and the Claude API:

https://www.noirdoc.de/docs/getting-started/

This can be self-hosted or managed by us!

Report

3mo ago