How do you benchmark your local LLM performance? π€
byβ’
Hey everyone! π
I've been running a lot of local LLMs (Llama, Mistral) and Diffusers lately on my machine. But I always struggle to accurately measure their performance.
Usually, I just look at "tokens/sec" in the terminal, but it feels inconsistent. π
How do you guys benchmark your local AI setup? Do you use any specific tools, or just rely on vibes?
I'm actually building an open-source tool (PKC Mark) to standardize this. Would love to hear your thoughts on what metrics matter most to you!
Happy coding! π»
8 views


Replies
Hi. We made a tool for this using guided influence benchmarking in Dowser. Its meant as a one stop shop for measuring impact of training data on LM.
Note: limited to LM not LLM at the moment.
Feel free to give it a shot.