What's the best AI model for coding?

by

New AI models pop up every week. Some developer tools like , , and let you choose between different models, while more opinionated products like and default to 1 model.

Curious what the community recommends for coding tasks? Any preferences?

3.5K views

Add a comment

Replies

Best

I've been using Opus 4.5 via Claude Code. Full disclosure, I've been quite skeptical about AI development, but have been consulting for an Anthropic partner and have been using it for some internal tooling. It can be pretty powerful if used correctly, although it still makes a lot of questionable decisions and mistakes, and needs decent oversight.

A few tips:
👉 You want to stay within about 40% of the context window. It's the sweet spot to get the best results.

👉 I first plan, and ask CC to create a feature spec, ideally broken down into stages.
👉 ALWAYS start with a clean working tree

👉 I start a new session, and ask it to read through the feature spec and implement stage 1

👉 Review the diff and make any necessary changes.

👉 Commit, and start a new session.

👉 Repeat.

I also try to stay away from very indepth instructions. They tend to confuse it and overload it with instructions and it skips instructions. I also keep a directory with feature definitions (generated by CC) as well as a folder for specific todos. (Basically skills, but let's be real. Agents and skills are just prompt text files). Like this at the start of a new session I can point it to a specific feature definition without it having to read through the codebase to figure out how the feature works, heavily saving on context size.

 can confirm about context: once you start stuffing the whole repo into the model, the quality of the solutions drops, especially with backend code..

   any workaround/best practices to improve the overall quality of the output in this case?

 The best practice I’ve settled on is keeping an up-to-date file (project tree + short module descriptions). I feed it at the start of the session so the model understands the architecture without having to read every file. Then I only add the specific 2-3 files that actually need changes. The less irrelevant code in the prompt, the fewer hallucinations you get

Agree that model selection depends heavily on the task. We found that the cost difference between models on the same task can be huge. Using aicosts.ai to track per-model cost per task type helped us route work intelligently. Sonnet for complex refactoring, a cheaper model for tests and linting. Same output quality, 40% less spend.

 For me, Sonnet 4.6 offers the best value for money; its encoding capabilities are good. Opus consumes too many tokens, so I can't afford it.

 Genuinely good advice I wish I had found it 2 months earlier had to learn the hard way burning tokens like crazy.
But the steps you describe is the gold standard works every time.

They say it's Claude but in Cursor I never see the difference - especially because ChatGPT has bigger context window.

 any thoughts on Composer 1?

 but it also about speed of development, try to use Composer instead of chatGPT, when you have 5 agents coding at the same time it gives you significant productivity boost without quality loss

not that simple in my experience, depends on tech stack and task complexity (I see diff winners on diff datasets) overall, I’d go with claude opus 4.5 for complex tasks, but gemini pro for ui/frontend and most of tasks for speed

depends on tech stack and task complexity

exactly - for for example, they recently launched their benchmark, updated daily, and currently, 's GPT 5.3 Codex (xhigh) is the most performant, achieving 90% on evals out of the box.

[1]:

Opus 4.5?

 my favorite. to quote , Opus is "on a different level, unreasonably good at " -

 its the best, but also the most expensive

Does the pricing also count as a criteria? 😅 Sonnet is quite expensive. Running glm 4.7 and codex 5.2 full-time is only a few percent behind sonnet and costs 3–7x less

haha yes! there are definitely many variables we could take into account - capacities, speed, pricing...

 Absolutely, cost matters a lot.

 I agree! There are alternatives to Sonnet, for sure. However, Sonnet is now recognized as the best (coupled with Claude Code) by a huge portion of developers, and I know many who don't want to take the risk of trying a cheaper model.
This is where token compression strategies come into play.
What if we could use Sonnet for the price of a Haiku?

 Reputation matters, for sure, and many devs prefer sticking with what’s widely seen as the best. And Sonnet is definitely a high-quality model. BUT 😁 if you’re building a SaaS product powered by AI, pricing is important. It directly impacts margins, scalability, and how competitive you can be. If alternatives are only a few percent behind at 3–7x lower cost (+ you can train them to achieve better quality), that gap is hard to ignore.

If there is a way to bring Sonnet closer to Haiku pricing, that’s a different story. Then it becomes a smart optimization rather than a tradeoff.

For , we use Sonnet 4.5 under the covers (though we're always evaluating the best model) as we've found it's got the right mix of strong agentic coding performance while being relatively fast. Other models are also quite good but tend to have very high latency and go heads-down for a long time.

What we've learned from our users is that while they want to operate at a high level, they also want to see granular progress from the agents — classic manager behavior. Sonnet has the right mix of capability and speed for this.

 Oh I like the reasoning, i.e. finding a balance between speed and capabilities.

What would be the close 2nd? And if we look at the capabilities only, what would be the best model from your POV? Also, why not Opus 4.5!?

 Opus is definitely a contender. It's more expensive but very capable and at least in my experience not terribly slower. I've found Codex to be quite smart but it likes to go "heads-down" for a long time before coming back with an answer -- which doesn't fit with the pattern of usage that we see from our users.

 thoughts on the all-new ?

In cursor, I have tried Opus 4.5 and GPT 5.2 in the plan mode, and personally, I prefer the former. However, I’m still torn on the best setup for fixing bugs. What are your preferences for debug mode? Do you stick with the same model or switch to a new one?

 For debugging, nothing beats GPT-5.2x High right now

 oh good to know! thanks for the suggestion

In cursor, we use Opus 4.5... I couldn't find a better model than that. Too bad it's not on the list.

 any experiences with Composer 1?

I tested it on small tasks and it’s … ok. But not for building full features.

I tested it on small tasks and it’s … ok. But not for building full features.

  ftw haha

Opus 4.5. The only downside is it consumes very quickly. So my second choice is sonnet 4.5

I am using Sonnet 4.5 vast majority of time and it works fast and precise, it is very robust !

is leading the way

 For my side projects, I currently use DeepSeek for agentic work. I usually refine features in ChatGPT first, then hand them off to DeepSeek for execution. So far, the setup works really well with minimal cost—about $2–3 per day of coding, plus my ChatGPT subscription, which I’d have anyway even if I weren’t coding.

123
•••
Next
Last