How do you benchmark your local LLM performance? 🤔

by•3mo ago

Hey everyone! 👋

I've been running a lot of local LLMs (Llama, Mistral) and Diffusers lately on my machine. But I always struggle to accurately measure their performance.

Usually, I just look at "tokens/sec" in the terminal, but it feels inconsistent. 😅

How do you guys benchmark your local AI setup? Do you use any specific tools, or just rely on vibes?

I'm actually building an open-source tool (PKC Mark) to standardize this. Would love to hear your thoughts on what metrics matter most to you!

Happy coding! 💻

8 views

Replies

Best

Hi. We made a tool for this using guided influence benchmarking in Dowser. Its meant as a one stop shop for measuring impact of training data on LM.

Note: limited to LM not LLM at the moment.

Feel free to give it a shot.

Report

2mo ago