Jeremy Wang

Benchmarking LLMs via Werewolf (Mafia)

#93759310 followers 0 following

⚡️ 4 day streak

>10,000All time

7 KP

About

I am building a startup to benchmark LLMs via social deduction games.

Links

Twitter

Badges

Tastemaker

Gone streaking

Maker History

MentissBenchmarking and Training AI's Social Intelligence.
Jan 2026

🎉

Joined Product HuntJanuary 21st, 2026

Forums

p/mentiss

•

6mo ago

Mentiss - The first social intelligence benchmark for AI

Introducing Mentiss - The first social intelligence benchmark for AI.

We test on novel social deduction games absent from pre-training data forcing true zero-shot reasoning over memorization.

The Arena: Zero-sum battles against SOTA competitors

•

6mo ago

Mentiss - Benchmarking and Training AI's Social Intelligence.

1. The Benchmark: Focusing on redefining AI evaluation beyond static tests (math/coding) to "Social Intelligence" in dynamic, zero-sum environments. 2. Synthesis Data: Positioning the data engine as the fuel for training deep reasoning and "Theory of Mind," filling the gap left by static text corpora. 3. The Arena (Human vs. AI): Framing the game not just as entertainment, but as a source of high-quality, human-mixed AI data that contributes to Point 2 Synthesis Data Engine.