What's the biggest API cost difference you've found between LLMs?

Hey guys,

I built Test AI Models because I kept seeing wild cost variations for similar quality outputs on ChatGPT, Grok, DeepSeek and others.

Biggest one I've found so far for customer support case (images bellow):

ChatGPT: $6.654 cost for 1 mil queries
Claude: $4.947 cost for 1 mil queries
DeepSeek: $79 cost for 1 mil queries
Grok: $245 cost for 1 mil queries

In my mind they are similar quality outputs. But 84x more expensive!

At 1mil queries you can save $6.575 just by choosing the right model.

What's the biggest savings you've discovered?

Testing on Test AI Models or elsewhere - curious what cost gaps people are finding in the wild.

Bonus: Did you actually switch models after discovering the difference, or stick with your original choice for other reasons?

You can test your case here: www.testaimodels.com

7 views

What's the biggest API cost difference you've found between LLMs?

Replies