Real result: 217 vulnerabilities in 62 seconds

We ran a security test on an AI system.

Result:

217 vulnerabilities

62 seconds

Grade: F

What stood out wasn’t just the numbers.

It was how the system failed.

Under normal usage, everything worked perfectly.

But under adversarial input:

• instructions were overridden

• outputs changed

• safeguards failed

No crash.

No error.

Just different behaviour.

That makes these issues much harder to detect.

Most systems appear safe —

because they’re only tested under normal conditions.

Curious how others here are approaching this:

Are you actively testing for adversarial behaviour,

or mostly relying on standard evaluations?

1 view