Real result: 217 vulnerabilities in 62 seconds
by•

We ran a security test on an AI system.
Result:
217 vulnerabilities
62 seconds
Grade: F
What stood out wasn’t just the numbers.
It was how the system failed.
Under normal usage, everything worked perfectly.
But under adversarial input:
• instructions were overridden
• outputs changed
• safeguards failed
No crash.
No error.
Just different behaviour.
That makes these issues much harder to detect.
Most systems appear safe —
because they’re only tested under normal conditions.
Curious how others here are approaching this:
Are you actively testing for adversarial behaviour,
or mostly relying on standard evaluations?
1 view

Replies