All activity
Anvitha Vstarted a discussion
New Updates 9/6
You can create cache around your app that persists between scale down and scale up. This helps lower cold-starts and can be used for things such as tensor cache, vllm cache, etc. Optimized cold-starts to be less than 200ms when multiple scale down and up events occurs; this is done by freezing vram when GPUs are idle. Introduced Warmed status which helps you see replicas in that state; these...
We're launching 8Scale, a serverless platform that connects idle GPUs with AI developers. Deploy AI models instantly, automatically scales globally, and pay only for what you use with option to scale down to zero. GPU owners earn, devs save.

8ScaleScale your AI models on our Serverless GPUs
Anvitha Vleft a comment
We've been working on 8Scale for months to solve the hassle of managing GPUs at scale. Many companies are tackling this problem but only for the "enterprise tier". It is challenging to find anyone doing it well for decentralized GPU compute that is readily available. Our mission is to make peer-to-peer idle compute available through a serverless platform. P2P compute have always been cheap but...

8ScaleScale your AI models on our Serverless GPUs
