How are you testing AI agents under adversarial input?
by•
We’ve been testing AI agents under adversarial input over the last few days.
One thing stood out:
Most systems behave perfectly under normal usage…
but fail quickly when the input is intentionally manipulated.
Things like:
• prompt injection
• instruction override
• role confusion
These aren’t rare edge cases — they’re surprisingly easy to trigger.
Which made us think:
Are we testing AI systems the wrong way?
Right now, most teams seem to focus on:
• accuracy
• performance
• latency
But not how systems behave under pressure.
Curious how others here are approaching this:
→ Do you actively test for adversarial scenarios?
→ Or is it still mostly functional testing?
Would love to hear real experiences.
1 view

Replies