Calculate GPU memory requirements and max concurrent requests for self-hosted LLM inference. Support for Llama, Qwen, DeepSeek, Mistral and more. Plan your AI infrastructure efficiently.
Built to simplify planning for self-hosted AI deployments.
Unlike other AI infrastructure tools, SelfHostLLM lets you precisely estimate GPU requirements and concurrency for Llama, Qwen, DeepSeek, Mistral, and more using custom config.
@xiaolong_zhu I know some people like Exo Labs (https://github.com/exo-explore/exo) are doing work to allow you to split bigger models across multiple devices - those can also be multiple macs. Do note that based on how you split it, it may require fast networking like the latest Thunderbolt 5.0 etc - but it certainly can be done.
No way, this is exactly what I needed! Figuring out GPU memory for LLMs has always been such a headache—super smart to automate it. Any plans to support multi-GPU setups?
This is truely helpful! I've been wrestling with GPU sizing for my self-hosted LLMs, and this tool is a lifesaver. Being able to precisely estimate requirements before I even start spinning up instances is kinda genius imo. Does it work with different quantization methods too?
Report
@erans Thank you sooooo much 🙏🏻. There’s way too much misleading information out there on the internet, even for an AI like Perplexity, making it hard to get the right info. This is a real time-saver! I can now focus on using my private LLM instead of spending days trying to make it work with all the parameters to set and understand. I really hope your site will eventually offer even more useful information and tips. But for now, this is absolutely perfect for me 👌🏻.
Replies
Raycast
Built to simplify planning for self-hosted AI deployments.
Unlike other AI infrastructure tools, SelfHostLLM lets you precisely estimate GPU requirements and concurrency for Llama, Qwen, DeepSeek, Mistral, and more using custom config.
B̶u̶t̶ n̶o̶w̶ I̶ w̶̶a̶n̶t̶ t̶o̶ s̶e̶e̶ ̶A̶p̶p̶l̶e̶ s̶i̶l̶i̶c̶o̶n̶ ̶a̶d̶d̶e̶d̶ t̶o̶ t̶h̶e̶ m̶i̶x̶!
Update: Now there's a Mac version too!
Agnes AI
Love how SelfHostLLM lets you actually estimate GPU needs for different LLMs—no more guessing and overbuying fr. Super smart idea, realy impressed!
Very cool calculator, looking forward to checking this out.
SelfHostLLM
Hi all, I'm the creator of SelfHostLLM.org.
You can read more about why I created it here:
https://www.linkedin.com/posts/e...
SelfHostLLM
Here is the Mac version: https://selfhostllm.org/mac/
SelfHostLLM
@xiaolong_zhu I know some people like Exo Labs (https://github.com/exo-explore/exo) are doing work to allow you to split bigger models across multiple devices - those can also be multiple macs. Do note that based on how you split it, it may require fast networking like the latest Thunderbolt 5.0 etc - but it certainly can be done.
AltPage.ai
No way, this is exactly what I needed! Figuring out GPU memory for LLMs has always been such a headache—super smart to automate it. Any plans to support multi-GPU setups?
CoSupport AI
Super useful — sizing GPU memory and concurrency upfront saves a ton of headaches. Love that it works with different models.
GPT-4o
This is truely helpful! I've been wrestling with GPU sizing for my self-hosted LLMs, and this tool is a lifesaver. Being able to precisely estimate requirements before I even start spinning up instances is kinda genius imo. Does it work with different quantization methods too?
@erans Thank you sooooo much 🙏🏻. There’s way too much misleading information out there on the internet, even for an AI like Perplexity, making it hard to get the right info. This is a real time-saver! I can now focus on using my private LLM instead of spending days trying to make it work with all the parameters to set and understand. I really hope your site will eventually offer even more useful information and tips. But for now, this is absolutely perfect for me 👌🏻.
SelfHostLLM
A new version is out with updated models and a PC (with NPU) version too