s1

s1

Matching o1-preview with Only 1000 Examples

6 followers

s1 is a simple recipe for test-time scaling of LLMs, achieving strong reasoning performance comparable to o1-preview using only 1,000 examples & budget forcing. Open-source model, data, and code available.
s1 gallery image
s1 gallery image
s1 gallery image
s1 gallery image
s1 gallery image
Launch Team
Threedium
Threedium
Image or Text to 3D Model
Promoted

What do you think? …

Zac Zuo
Hi everyone, Sharing s1, a new approach to improving LLM performance at test time. This work comes from researchers at Stanford University and the University of Washington, and offers some exciting results: · Strong Reasoning: Matches the performance of larger models (like o1-preview) on reasoning tasks. · Minimal Data: Achieves this with only 1,000 examples (insane, right?). · Budget Forcing: Uses a novel "budget forcing" technique during inference. · Test-Time Scaling: Improves performance without retraining the model. · Fully Open-Source: Model, data, and code are all available.