All activity
Jeremy Wangleft a comment
Benchmark AI with social deduction games like Werewolf/Mafia. It’s not about winning a game. It’s about evaluating models' strategic reasoning and linguistic intelligence in Zero-Shot Reasoning, Zero-Sum environments.

MentissBenchmarking and Training AI's Social Intelligence.
1. The Benchmark: Focusing on redefining AI evaluation beyond static tests (math/coding) to "Social Intelligence" in dynamic, zero-sum environments.
2. Synthesis Data: Positioning the data engine as the fuel for training deep reasoning and "Theory of Mind," filling the gap left by static text corpora.
3. The Arena (Human vs. AI): Framing the game not just as entertainment, but as a source of high-quality, human-mixed AI data that contributes to Point 2 Synthesis Data Engine.

MentissBenchmarking and Training AI's Social Intelligence.
Jeremy Wangstarted a discussion
Mentiss - The first social intelligence benchmark for AI
Introducing Mentiss - The first social intelligence benchmark for AI. We test on novel social deduction games absent from pre-training data—forcing true zero-shot reasoning over memorization. The Arena: Zero-sum battles against SOTA competitors Data Engine: Sequential auto-labeled training data via self-play Iteration: A closed feedback loop where data and models co-evolve Safety Lab: A...
