Zac Zuo

Stax - Move your LLM evals from vibes to data

Stax is a tool from Google Labs to solve LLM evaluation. Move beyond "vibe testing" by building custom autoraters to measure what matters to you. It's a full toolkit for testing your AI stack with your data, with support for all major model providers.

Add a comment

Replies

Best
Zac Zuo

Hi everyone!

Stax is one of the few products I've seen recently that got me genuinely excited. It tackles a core problem for anyone building with LLMs: how to objectively evaluate output quality beyond just "vibe testing." We've already started using it with my internal dev team.

It solves two major headaches right away. First, it integrates with all the major model providers, so you're not stuck building your own testing harnesses. Second, the way you can batch test across custom use cases is incredibly convenient.

One of my team members responsible for QA summed it up perfectly, and I quote:

"I wish I had this a few months ago!"

Tony Tong

@zaczuo Stax feels like a real step forward from “vibe testing.” The integrations and batch testing are clear wins. I wonder though, how does it approach subjective trade-offs, like when creativity and accuracy pull in different directions?

Abdul Rehman

Wishing you the best with the launch. Keep up the awesome work!

Bhautik Khunt

@sara_wiltberger How many active total projects does Alphabet have?

Daniel Lee
This is really cool! Actually I think this is what every vibe coding platform should embed