Steven Willmott

Steven Willmott

Builder at Safe Intelligence
All activity
Spec27 is a validation platform for AI agents. It helps teams move beyond manual, vibes-based testing by using machine-readable specifications to generate broader test coverage, catch regressions earlier, and validate both in-house and third-party systems without needing SDK integration or code-level access.
Spec27
Spec27Spec-driven testing for AI agents and AI apps
Steven Willmottstarted a discussion

What kind of Agent validation are you doing today?

Everything started with model Evals and benchmarks (which model is better?), then evolved to prompt management and from there to analyzing traces. What do people do today, and how are they sourcing test datasets?