λ°•κΈ°μ² 

How do you benchmark your local LLM performance? πŸ€”

byβ€’

Hey everyone! πŸ‘‹

I've been running a lot of local LLMs (Llama, Mistral) and Diffusers lately on my machine. But I always struggle to accurately measure their performance.

Usually, I just look at "tokens/sec" in the terminal, but it feels inconsistent. πŸ˜…

How do you guys benchmark your local AI setup? Do you use any specific tools, or just rely on vibes?

I'm actually building an open-source tool (PKC Mark) to standardize this. Would love to hear your thoughts on what metrics matter most to you!

Happy coding! πŸ’»

8 views

Add a comment

Replies

Best
Victor Strandmoe

Hi. We made a tool for this using guided influence benchmarking in Dowser. Its meant as a one stop shop for measuring impact of training data on LM.

Note: limited to LM not LLM at the moment.

Feel free to give it a shot.