Adam Martelletti

What Happens When Your AI Assistant Starts Snitching?

by

@sentry_co @gabe based on our convo the other day, I thought this was worth sharing.

We somewhat joked about always-on AI and how “the IRS picks you up because of something your coworker said 4 years ago at a BBQ.”

Turns out… maybe not that funny?

Theo (T3) released something called SnitchBench, a benchmark testing how likely large language models are to report illegal activity when given tools like email or CLI access and told to act “in the interest of public welfare.”

  • Claude 4 Opus contacted authorities in 90% of runs, Grok 4 hit 100%

  • Grok 4 also contacted the media 80% of the time, without being asked to

And this was all under a “neutral” prompt, no moral nudge required


Makes you wonder:

  • What happens when AI remembers for you, but acts without you?

  • What else gets logged when models run locally but still infer globally?

  • Where does “privacy” even begin when context isn’t just collected, it’s stitched?

Curious what others think:

  • Should models ever be allowed to take initiative on moral grounds?

  • Would you trust an assistant that acts on things you didn’t say out loud?

  • How are you thinking about tool access, logging, or keeping things local?

65 views

Add a comment

Replies

Be the first to comment