PromptDiff - Compare LLMs across models. One API call.

Stop copy-pasting prompts across ChatGPT, Claude, and Gemini. PromptDiff compares LLM outputs in one API call. Send a prompt + pick models. Get back: output, latency (ms), tokens, and cost (USD) per model. 8 models, 4 providers: - Claude Sonnet & Haiku - GPT-4o & 4o-mini - Gemini Pro & Flash - Grok 3 & 3 Mini No SDK needed. Works with curl, Python, TypeScript. Free: 100 evals/month, no card. Built by a solo dev from Tokyo.

Hey Product Hunt! I'm Maiki — solo developer based in Japan. I built PromptDiff in 2 days as a side project, and I wanted to share the story behind it. The problem I kept hitting Every time I built something with LLMs, I had the same frustrating workflow: open ChatGPT, paste prompt, copy output. Switch tab, open Claude, paste again. Switch tab again for Gemini. Then try to compare three walls of text side-by-side. There are great observability platforms (Braintrust, LangSmith) — but they're all platforms. SDK installs, dashboard configs, dataset uploads. Overkill when you just want to answer "which model handles this prompt best?" I wanted something like curl — one call, structured output, done. What PromptDiff does - POST a prompt and a list of models - Get back: outputs, latency in ms, token counts, cost in USD - Works with Claude, GPT-4o, Gemini, Grok (8 models) - No SDK required — plain HTTP - CI/CD friendly (JSON in, JSON out) Pricing Free tier is 100 evals/month with no credit card. Paid is usage-based, no subscription. What I'd love feedback on - Is raw output + latency + cost enough, or do you want scoring too? - What models do you wish were supported? - Would you use this in CI/CD for prompt regression testing? Try it at https://promptdiff.bizmarq.com Happy to answer any questions below!

PromptDiff - Compare LLMs across models. One API call.

Replies