Most AI agent tests check if your agent gave the right answer. Roleplay tests whether it can be manipulated into doing the wrong thing.
Roleplay runs social-engineering attack packs against your agent, captures exploit proof, helps verify the fix, and keeps checking for regressions, so agent security becomes a repeatable workflow, not a one-time vibe check.
Try it on roleplay.sh.