Prashanth Manohar

Prashanth Manohar

Solving Cold Starts in LLMs

About

Building InferX. Eliminating Cold Starts in LLMs. Increasing GPU Utilization by 5X

Badges

Gone streaking
Gone streaking

Forums

~1s cold start for a 32B model.

~1s cold start for a 32B model. Most setups we ve seen fall into two buckets: multi-second to minute cold starts (model load + init) or keeping GPUs warm to avoid that We ve been experimenting with restoring initialized model state instead of reloading weights. This demo shows ~1s cold start for a 32B model. https://youtu.be/G8DsbS1mcwo
View more