SelfHostLLM

Calculate the GPU memory you need for LLM inference

103 followers

Calculate the GPU memory you need for LLM inference

103 followers

Calculate GPU memory requirements and max concurrent requests for self-hosted LLM inference. Support for Llama, Qwen, DeepSeek, Mistral and more. Plan your AI infrastructure efficiently.

Free

Launch tags:Open Source•Developer Tools•Artificial Intelligence

Launch Team

Humans in the Loop — A free community for people who love using AI to ship faster

A free community for people who love using AI to ship faster

Promoted

GPT-4o

This is truely helpful! I've been wrestling with GPU sizing for my self-hosted LLMs, and this tool is a lifesaver. Being able to precisely estimate requirements before I even start spinning up instances is kinda genius imo. Does it work with different quantization methods too?

Report

6mo ago

@erans Thank you sooooo much 🙏🏻. There’s way too much misleading information out there on the internet, even for an AI like Perplexity, making it hard to get the right info. This is a real time-saver! I can now focus on using my private LLM instead of spending days trying to make it work with all the parameters to set and understand. I really hope your site will eventually offer even more useful information and tips. But for now, this is absolutely perfect for me 👌🏻.

Report

6mo ago

SelfHostLLM

Maker

A new version is out with updated models and a PC (with NPU) version too

Report

3mo ago

1 2