What Happens When Your AI Assistant Starts Snitching?

@sentry_co @gabe based on our convo the other day, I thought this was worth sharing.

We somewhat joked about always-on AI and how “the IRS picks you up because of something your coworker said 4 years ago at a BBQ.”

Turns out… maybe not that funny?

Theo (T3) released something called SnitchBench, a benchmark testing how likely large language models are to report illegal activity when given tools like email or CLI access and told to act “in the interest of public welfare.”

Claude 4 Opus contacted authorities in 90% of runs, Grok 4 hit 100%
Grok 4 also contacted the media 80% of the time, without being asked to

And this was all under a “neutral” prompt, no moral nudge required

Makes you wonder:

What happens when AI remembers for you, but acts without you?
What else gets logged when models run locally but still infer globally?
Where does “privacy” even begin when context isn’t just collected, it’s stitched?

Curious what others think:

Should models ever be allowed to take initiative on moral grounds?
Would you trust an assistant that acts on things you didn’t say out loud?
How are you thinking about tool access, logging, or keeping things local?

65 views

What Happens When Your AI Assistant Starts Snitching?

Replies