Building TestForge AI in public launching here on PH next Tuesday (June 9) and I want to gut-check the problem framing with this community before launch day.
My working hypothesis: most QA teams spend half their week on plumbing writing Page Object boilerplate, fixing brittle locators after every UI change, and triaging flaky failures that turn out to be environment issues, not real bugs. The actual signal (real defects) gets drowned in the noise.
So I built TestForge AI to remove the plumbing: paste a requirement it drafts Gherkin scenarios generates the Playwright TypeScript test files (using Microsoft Playwright MCP to scrape the live DOM and pick stable selectors) runs everything in disposable containers when something fails, an AI analyst built on Anthropic's Claude classifies it (real bug vs flake) and explains it in plain English.
The technical bet: deterministic-first, AI-second. Rules engine handles the common cases instantly; Claude only gets consulted for the 1-2% of edge cases where the deterministic layer is uncertain. Every classification shows you why it was made.