Version, test & compare your AI prompts across models

PromptBench - Version, test & compare your AI prompts across models

by•4mo ago

Stop guessing which prompts work. PromptBench lets you run the same prompt on Claude, GPT-5, o3, and Mistral side-by-side, score outputs 1-10, and track performance over time with analytics. Features: multi-model playground, prompt versioning, scoring, analytics dashboard, chat & complete modes. 10 models supported. Free with your own API keys. Pro $12/mo with managed credits.

Replies

Best

Maker

📌

Hey PH! I built PromptBench because I was tired of testing prompts in 5 different tabs and losing track of what worked. The core insight: prompt engineering is iterative, but nobody treats it that way. We version our code, A/B test our UIs — why not our prompts? Tech stack: Next.js, Supabase, shadcn/ui, Stripe. Supports Claude, GPT-5, GPT-4o, o3, Mistral. Free tier is BYOK (bring your own API keys) — no catch, no credit card. Would love your feedback!

Report

4mo ago