~1s cold start for a 32B model.
by•
~1s cold start for a 32B model.
Most setups we’ve seen fall into two buckets:
• multi-second to minute cold starts (model load + init)
• or keeping GPUs warm to avoid that
We’ve been experimenting with restoring initialized model state instead of reloading weights.
This demo shows ~1s cold start for a 32B model.
https://youtu.be/G8DsbS1mcwo
Replies