s1

s1

Matching o1-preview with Only 1000 Examples

6 followers

s1 is a simple recipe for test-time scaling of LLMs, achieving strong reasoning performance comparable to o1-preview using only 1,000 examples & budget forcing. Open-source model, data, and code available.
s1 gallery image
s1 gallery image
s1 gallery image
s1 gallery image
s1 gallery image
Launch Team
Tines
Tines
The intelligent workflow platform
Promoted

What do you think? …

Zac Zuo
Hi everyone, Sharing s1, a new approach to improving LLM performance at test time. This work comes from researchers at Stanford University and the University of Washington, and offers some exciting results: · Strong Reasoning: Matches the performance of larger models (like o1-preview) on reasoning tasks. · Minimal Data: Achieves this with only 1,000 examples (insane, right?). · Budget Forcing: Uses a novel "budget forcing" technique during inference. · Test-Time Scaling: Improves performance without retraining the model. · Fully Open-Source: Model, data, and code are all available.