Jeremy Wang's profile on Product Hunt

All activity

2mo ago

Benchmark AI with social deduction games like Werewolf/Mafia. It’s not about winning a game. It’s about evaluating models' strategic reasoning and linguistic intelligence in Zero-Shot Reasoning, Zero-Sum environments.

MentissBenchmarking and Training AI's Social Intelligence.

Jeremy WanghuntedMentiss

2mo ago

1. The Benchmark: Focusing on redefining AI evaluation beyond static tests (math/coding) to "Social Intelligence" in dynamic, zero-sum environments. 2. Synthesis Data: Positioning the data engine as the fuel for training deep reasoning and "Theory of Mind," filling the gap left by static text corpora. 3. The Arena (Human vs. AI): Framing the game not just as entertainment, but as a source of high-quality, human-mixed AI data that contributes to Point 2 Synthesis Data Engine.

MentissBenchmarking and Training AI's Social Intelligence.

Jeremy Wangstarted a discussion

2mo ago

Mentiss - The first social intelligence benchmark for AI

Introducing Mentiss - The first social intelligence benchmark for AI. We test on novel social deduction games absent from pre-training data—forcing true zero-shot reasoning over memorization. The Arena: Zero-sum battles against SOTA competitors Data Engine: Sequential auto-labeled training data via self-play Iteration: A closed feedback loop where data and models co-evolve Safety Lab: A...