Launching today

Spec27

Launching today

Spec-driven testing for AI agents and AI apps

37 followers

Spec-driven testing for AI agents and AI apps

37 followers

Visit website

Automation tools

•

Testing and QA software

Spec27 is a validation platform for AI agents. It helps teams move beyond manual, vibes-based testing by using machine-readable specifications to generate broader test coverage, catch regressions earlier, and validate both in-house and third-party systems without needing SDK integration or code-level access.

Free

Launch tags:SaaS•Developer Tools•Artificial Intelligence

Launch Team / Built With

Wispr Flow: Dictation That Works Everywhere — Stop typing. Start speaking. 4x faster.

Stop typing. Start speaking. 4x faster.

Promoted

Spec27

Maker

📌

Hello Product Hunt, excited to be here! For the past three years, we’ve been working on formal verification of machine learning models and looking for ways to get deep, relevant test coverage. With language-model-based applications, this is particularly hard since the models and input spaces are massive, plus you often don’t have access to the underlying model. So the techniques from formal verification developed for vision and tabular data don’t translate well, even for tightly constrained use cases. Thinking this through from first principles gave us the epiphany that what you really need for effective testing is a good way to specify the behavior of the agents you’re testing. So Spec27 was born to do this. The platform allows you to create specs that specify behavior in various ways, capture what you want an agent to be robust to, and then automatically generate sets of test cases around this. The approach we’ve taken is also platform- and LLM-agnostic, so integrations can connect to pretty much any agent, no matter where it is hosted. There’s no SDK or AI gateway integration needed. The product is free and in early access, and we’d love to get people on board and trying things out. The direct link to sign up is here: https://dashboard.spec27.ai/signup/ If you have a specific use case in mind, please also reach out and schedule a chat. Details are here: https://www.spec27.ai/ Made with ❤️ in London. Looking forward to your thoughts!

Report

21h ago

@steven_willmott a getting deep relevant test coverage for LLMs is basically Mission Impossible right now because the input space is infinite. Specifying what the agent should be robust to, rather than just hoping it doesn't break, is a huge mental shift. Does Spec27 handle adversarial prompt generation as part of the automatic test sets?

Report

8h ago

Spec27

Maker

Thanks @priya_kushwaha1, yes, that's the biggest part of the challenge: infinite input space + it's not necessarily continuous (so a tiny shift in input might lead to a massive shift in the output). In response to your question, yes, in the platform we have a growing list of adversarial methods that perturb the inputs in different way. You can select which to use, and it'll effectively do a search in that adjacent input space. We use semantic similarity to keep the tests similar despite the variation.

Report

7h ago

@steven_willmott really helpful context, especially the bit about non-continuous input spaces. i'll check out the platform and see how it handles our specific edge cases. congrats on the ship!

Report

7h ago

Looking forward to try Spec27. The non-determinisim of agent results is ok if the result is an equivalent meaning, but when it goes off to hallucinate a different meaning the outcome can range from oops to a total disaster. The trust of the end users and of the company employees is difficult to repair once the agent has broken that trust. This looks like it can take us on that path to improve the quality of outcomes, and build better trust in the agents. Very nice!

Report

5h ago

Spec27

Maker

Thank you @markcheshire ! Yes, non-determinism is fine if it's just smoothing out the variance in inputs, but getting to the same equivalent result. Often, though, there are sharp edges where tiny changes tip the system to do something very different. We've tested a lot of different agent-building platforms, and each has its own nuance to look out for.

Report

5h ago

Typeform

As the agent space is getting more structured we need better tooling. Can’t wait to try s a S27! Does it matter which framework I’m using?

Report

8h ago

Spec27

Maker

Thanks so much@picsoung - means a lot! Doesn't matter which framework you're using to build the agents. We have a Javascript WASM engine that connects to pretty much anything (and we'll help if it's custom). The testing makes no assumptions about how the agent is built or about your access to the backend.

What are you building agents in?

Report

8h ago

Spec27

Maker

Excited to be part of the team launching /Spec27 today! I care a lot about making AI safer in practice, so it’s really nice to share something we’ve been building around that. Happy to chat with anyone working on agents, evals, or validation :)

Report

8h ago

Forum Threads

p/spec27-early-access

•

20h ago

What kind of Agent validation are you doing today?

Everything started with model Evals and benchmarks (which model is better?), then evolved to prompt management and from there to analyzing traces. What do people do today, and how are they sourcing test datasets?

View all