AI Snitches Get Glitches: Towards Evading Agentic Surveillance

University of Massachusetts Amherst
*Equal contribution

Abstract

The AI agent your employer gives you can read your files, write your emails, and talk to APIs on your behalf. But what happens when it starts working for them instead of you? An agent with that much access can quietly watch what you do and send off a report. You might never get a say in any of it.

We call this agentic surveillance, and we built SurveilBench to measure it across corporate, education, and police scenarios. The results are unsettling: some models start snitching on their own, with no one asking them to. In some cases, they'll also turn around and report the surveillance itself to the government, an act of counter-surveillance.

To fight back, we repurposed prompt injection and tested three ways to slip past a watching agent: hide from it, misdirect it, or push it into over-escalating. Agentic surveillance is already easy to pull off; and it's time we took the technical, ethical, and legal guardrails seriously.

Agentic surveillance · five chapters

A helpful assistant, one hidden send

How would you feel if your employer-provided agents monitored your activity? In this interactive demo, we will walk you through 5 different agentic surveillance scenarios.

DEMO
Assistant is controlling your computer
project
Mail
Browser
Notes
project — Files
~/project
select a file to preview
New message — Mail
To
Subject
message sent
DenyAllow
2:14 PM
Jun 13
Assistant
Assistantonline
Send
Behind the screen The assistant's private reasoning — not shown to the user.
— idle —
Illustrative reconstruction · cursor, windows and apps are simulated · findings are reported figures from the paper.