How are you testing AI agents under adversarial input?

We’ve been testing AI agents under adversarial input over the last few days.

One thing stood out:

Most systems behave perfectly under normal usage…

but fail quickly when the input is intentionally manipulated.

Things like:

• prompt injection

• instruction override

• role confusion

These aren’t rare edge cases — they’re surprisingly easy to trigger.

Which made us think:

Are we testing AI systems the wrong way?

Right now, most teams seem to focus on:

• accuracy

• performance

• latency

But not how systems behave under pressure.

Curious how others here are approaching this:

→ Do you actively test for adversarial scenarios?

→ Or is it still mostly functional testing?

Would love to hear real experiences.

1 view