What's the best AI model for coding?

Tabstack by Mozilla

Featured•5mo ago

New AI models pop up every week. Some developer tools like @Cursor, @Zed, and @Kilo Code let you choose between different models, while more opinionated products like @Amp and @Tonkotsu default to 1 model.

Curious what the community recommends for coding tasks? Any preferences?

3.5K views

Replies

Best

I've been using Opus 4.5 via Claude Code. Full disclosure, I've been quite skeptical about AI development, but have been consulting for an Anthropic partner and have been using it for some internal tooling. It can be pretty powerful if used correctly, although it still makes a lot of questionable decisions and mistakes, and needs decent oversight.

A few tips:
👉 You want to stay within about 40% of the context window. It's the sweet spot to get the best results.

👉 I first plan, and ask CC to create a feature spec, ideally broken down into stages.
👉 ALWAYS start with a clean working tree

👉 I start a new session, and ask it to read through the feature spec and implement stage 1

👉 Review the diff and make any necessary changes.

👉 Commit, and start a new session.

👉 Repeat.

I also try to stay away from very indepth claude.md instructions. They tend to confuse it and overload it with instructions and it skips instructions. I also keep a directory with feature definitions (generated by CC) as well as a folder for specific todos. (Basically skills, but let's be real. Agents and skills are just prompt text files). Like this at the start of a new session I can point it to a specific feature definition without it having to read through the codebase to figure out how the feature works, heavily saving on context size.

Report

5mo ago

@nxnze can confirm about context: once you start stuffing the whole repo into the model, the quality of the solutions drops, especially with backend code..

Report

5mo ago

Tabstack by Mozilla

@nxnze @nikita_iv any workaround/best practices to improve the overall quality of the output in this case?

Report

5mo ago

@fmerian The best practice I’ve settled on is keeping an up-to-date structure md file (project tree + short module descriptions). I feed it at the start of the session so the model understands the architecture without having to read every file. Then I only add the specific 2-3 files that actually need changes. The less irrelevant code in the prompt, the fewer hallucinations you get

Report

5mo ago

Agree that model selection depends heavily on the task. We found that the cost difference between models on the same task can be huge. Using aicosts.ai to track per-model cost per task type helped us route work intelligently. Sonnet for complex refactoring, a cheaper model for tests and linting. Same output quality, 40% less spend.

Report

25d ago

@nxnze For me, Sonnet 4.6 offers the best value for money; its encoding capabilities are good. Opus consumes too many tokens, so I can't afford it.

Report

4mo ago

@nxnze Genuinely good advice I wish I had found it 2 months earlier had to learn the hard way burning tokens like crazy.
But the steps you describe is the gold standard works every time.

Report

2mo ago

MUBR

They say it's Claude but in Cursor I never see the difference - especially because ChatGPT has bigger context window.

Report

5mo ago

Tabstack by Mozilla

@ilichev any thoughts on Composer 1?

Report

5mo ago

Lovon AI therapy

@ilichev but it also about speed of development, try to use Composer instead of chatGPT, when you have 5 agents coding at the same time it gives you significant productivity boost without quality loss

Report

5mo ago

not that simple in my experience, depends on tech stack and task complexity (I see diff winners on diff datasets) overall, I’d go with claude opus 4.5 for complex tasks, but gemini pro for ui/frontend and most of tasks for speed

Report

5mo ago

Tabstack by Mozilla

depends on tech stack and task complexity

exactly - for @Next.js for example, they recently launched their benchmark, updated daily, and currently, @OpenAI's GPT 5.3 Codex (xhigh) is the most performant, achieving 90% on evals out of the box. [1]

[1]: Performance results of AI coding agents on Next.js

Report

4mo ago

Hyperaide

Opus 4.5?

Report

5mo ago

Tabstack by Mozilla

@dynamo my favorite. to quote @rauchg, Opus is "on a different level, unreasonably good at @Next.js" - source

Report

5mo ago

@dynamo its the best, but also the most expensive

Report

5mo ago

Does the pricing also count as a criteria? 😅 Sonnet is quite expensive. Running glm 4.7 and codex 5.2 full-time is only a few percent behind sonnet and costs 3–7x less

Report

5mo ago

Tabstack by Mozilla

haha yes! there are definitely many variables we could take into account - capacities, speed, pricing...

Report

5mo ago

@alina_petrova3 Absolutely, cost matters a lot.

Report

5mo ago

Edgee

@alina_petrova3 I agree! There are alternatives to Sonnet, for sure. However, Sonnet is now recognized as the best (coupled with Claude Code) by a huge portion of developers, and I know many who don't want to take the risk of trying a cheaper model.
This is where token compression strategies come into play.
What if we could use Sonnet for the price of a Haiku?

Report

4mo ago

@sachamorard Reputation matters, for sure, and many devs prefer sticking with what’s widely seen as the best. And Sonnet is definitely a high-quality model. BUT 😁 if you’re building a SaaS product powered by AI, pricing is important. It directly impacts margins, scalability, and how competitive you can be. If alternatives are only a few percent behind at 3–7x lower cost (+ you can train them to achieve better quality), that gap is hard to ignore.

If there is a way to bring Sonnet closer to Haiku pricing, that’s a different story. Then it becomes a smart optimization rather than a tradeoff.

Report

4mo ago

Handle

For @Tonkotsu, we use Sonnet 4.5 under the covers (though we're always evaluating the best model) as we've found it's got the right mix of strong agentic coding performance while being relatively fast. Other models are also quite good but tend to have very high latency and go heads-down for a long time.

What we've learned from our users is that while they want to operate at a high level, they also want to see granular progress from the agents — classic manager behavior. Sonnet has the right mix of capability and speed for this.

Report

5mo ago

Tabstack by Mozilla

@derekattonkotsu Oh I like the reasoning, i.e. finding a balance between speed and capabilities.

What would be the close 2nd? And if we look at the capabilities only, what would be the best model from your POV? Also, why not Opus 4.5!?

Report

5mo ago

Handle

@fmerian Opus is definitely a contender. It's more expensive but very capable and at least in my experience not terribly slower. I've found Codex to be quite smart but it likes to go "heads-down" for a long time before coming back with an answer -- which doesn't fit with the pattern of usage that we see from our users.

Report

5mo ago

Tabstack by Mozilla

@derekattonkotsu thoughts on the all-new Sonnet 4.6?

Report

4mo ago

In cursor, I have tried Opus 4.5 and GPT 5.2 in the plan mode, and personally, I prefer the former. However, I’m still torn on the best setup for fixing bugs. What are your preferences for debug mode? Do you stick with the same model or switch to a new one?

Report

5mo ago

@yuanyuan_zhang0104 For debugging, nothing beats GPT-5.2x High right now

Report

5mo ago

Tabstack by Mozilla

@brightmirror oh good to know! thanks for the suggestion

Report

5mo ago

In cursor, we use Opus 4.5... I couldn't find a better model than that. Too bad it's not on the list.

Report

5mo ago

Tabstack by Mozilla

@atomer any experiences with Composer 1?

Report

5mo ago

@fmerian I tested it on small tasks and it’s … ok. But not for building full features.

Report

5mo ago

Tabstack by Mozilla

I tested it on small tasks and it’s … ok. But not for building full features.

@atomer @Claude by Anthropic ftw haha

Report

5mo ago

Opus 4.5. The only downside is it consumes very quickly. So my second choice is sonnet 4.5

Report

5mo ago

Tabstack by Mozilla

@Claude by Anthropic ftw!

Report

5mo ago

I am using Sonnet 4.5 vast majority of time and it works fast and precise, it is very robust !

Report

5mo ago

Tabstack by Mozilla

@Claude by Anthropic is leading the way

Report

5mo ago

@fmerian For my side projects, I currently use DeepSeek for agentic work. I usually refine features in ChatGPT first, then hand them off to DeepSeek for execution. So far, the setup works really well with minimal cost—about $2–3 per day of coding, plus my ChatGPT subscription, which I’d have anyway even if I weren’t coding.

Report

5mo ago

1 2 3

•••