trending
Anvitha V

5mo ago

New Updates 9/6

  • You can create cache around your app that persists between scale down and scale up. This helps lower cold-starts and can be used for things such as tensor cache, vllm cache, etc.

  • Optimized cold-starts to be less than 200ms when multiple scale down and up events occurs; this is done by freezing vram when GPUs are idle.

  • Introduced Warmed status which helps you see replicas in that state; these will cold-start in less than 200ms. We always prioritize starting Warmed replicas first, before scaling up Idle since they scale up faster.

Anvitha V

6mo ago

8Scale - Scale your AI models on our Serverless GPUs

We're launching 8Scale, a serverless platform that connects idle GPUs with AI developers. Deploy AI models instantly, automatically scales globally, and pay only for what you use with option to scale down to zero. GPU owners earn, devs save.