Opus Isn't Dethroned Yet
The benchmarks say the gap is closing. My day-to-day experience says otherwise.
There's been a lot of noise lately about models coming for Opus. Codex, GLM-5, Minimax-2.5. The benchmarks look close, the tweets are breathless, and if you spend any time skim reading online you'd be forgiven for thinking the gap has closed.
I want this to be true. Especially when my monthly Anthropic subscription puts a dent in my bank account. And that's not even to mention the weekly limits and session caps which are real constraints when using Claude Code close to full-time. If I could get comparable results from a cheaper model, I absolutely would switch. So I've been testing, but I keep coming back to the same conclusion.
Opus and Sonnet are still the best tools for the job, and not by a little.
What "not as good" actually means
The tricky part is quantifying what "not as good" actually means. It's not something I take notes on while I work. The gaps don't usually announce themselves as dramatic failure (although that does happen), rather they tend to show up as friction. With the non-Anthropic models, I find myself adding more context, more clarification, more instruction just to get to the same place. Opus and Sonnet seem to read the situation, fill in the gaps, and get on with it.
That quality, reading between the lines, I suspect is what benchmarks miss. Yet it is one of the most important ways I perceive a model. Opus feels proactive without trying to boil the ocean. It's helpful without being pestering. I don't have to spell everything out, and when I do have to give direction, it picks up the thread and runs with it in the right direction.
Where the individual models land
Codex is the closest thing to a real contender. It's usable, genuinely so, and if I've burned through my Opus and Sonnet usage for the week it will get work done. If I'd never used the Anthropic models I'd probably think it was pretty good.
Minimax-2.5 and GLM-5 are fast and cheap, and they'll handle a clean refactor, but they need to be told exactly what to do. Neither has the intuition for ambiguous work.
Writing code is the easy part
They can all write code. Where they actually differ is everything that surrounds it: inferring context from a Jira ticket, reading files, navigating a codebase, running commands, pulling in context from external services and logs. That's most of what I'm actually doing day to day, and that's where Minimax-2.5 and GLM-5 really fall apart. Codex handles it better than either of them, but it still doesn't get close to what Opus and Sonnet do in those situations.
I'll keep testing as things develop. I genuinely hope something closes the gap, because running Opus and Sonnet full-time is expensive. But right now, nothing else makes my day easier quite like Opus and Sonnet do.