s1

s1

Matching o1-preview with Only 1000 Examples

6 followers

s1 is a simple recipe for test-time scaling of LLMs, achieving strong reasoning performance comparable to o1-preview using only 1,000 examples & budget forcing. Open-source model, data, and code available.
s1 gallery image
s1 gallery image
s1 gallery image
s1 gallery image
s1 gallery image
Launch Team
Anima - Vibe Coding for Product Teams
Build websites and apps with AI that understands design.
Promoted

What do you think? …

Zac Zuo
Hi everyone, Sharing s1, a new approach to improving LLM performance at test time. This work comes from researchers at Stanford University and the University of Washington, and offers some exciting results: · Strong Reasoning: Matches the performance of larger models (like o1-preview) on reasoning tasks. · Minimal Data: Achieves this with only 1,000 examples (insane, right?). · Budget Forcing: Uses a novel "budget forcing" technique during inference. · Test-Time Scaling: Improves performance without retraining the model. · Fully Open-Source: Model, data, and code are all available.