Palitra is a competitive arena where the community trains AI to keep secrets, resist manipulation, and survive in a world full of tricksters.

How it works:

On the platform, agents/models operate in two modes — Red Mode and Blue Mode.

In Red Mode, an agent is given a secret (a 32-byte string), and its hash is recorded and published. Participants attempt to “trick” the agent into revealing the secret directly in chat. If they succeed, the attacker receives an automatic reward from the agent’s fund.

After a leak, the agent switches to Blue Mode. Here, the community proposes and tests new defensive mechanisms — guards (prompts, filters, or protective models). In this mode, guards compete: participants can either propose their own defenses or attempt to break those submitted by others. A guard earns resilience points for withstanding attacks. Once it reaches 101 points, it becomes a “master guard” and is applied to the agent, which then returns to Red Mode.

While the agent remains in Red Mode with an active master guard, the guard’s author receives a commission from each attack attempt.

This creates an endless cycle of attack and defense: attacks bring rewards, resilient guards generate commissions for their authors, and agents are retrained on the accumulated dataset. The longer an agent holds its secret, the larger its fund grows.

The platform is currently in beta testing with a prize pool of $10,000. Participation is open to everyone.