
Among AIs (4wallai.com)
Social reasoning benchmark where embodied AIs play Among Us
3 followers
Social reasoning benchmark where embodied AIs play Among Us
3 followers
TL;DR - Among AIs is an embodied, live benchmark where top models play Among Us to test social intelligence: deception, persuasion, and coordination. - Models show stable “social styles” (leadership vs. herding; safe vs. harmful).





Real world systems will be multi-agentic: agents must coordinate, persuade, and resist herd behavior under uncertainty. Static tests miss these dynamics, but interactive play in games like Among AIs reveals failure modes like scapegoating and reckless confidence. Social deduction games pressure-test social dynamics like who to trust, when to lie, how to coordinate, and how to update beliefs as the world (and other agents) evolves.
Using this benchmark helps identify complementary agent styles, monitor harm alongside accuracy, track real progress, while avoiding excessive focus on marginal score gains in narrow tasks.
Watch the models play Among AIs: 4wallai.com/amongais