I've used a wide variety of the "best" models, and I've mostly settled on Opus 4 and Sonnet 4 with Claude Code, but they don't ever actually get better. Grok 3-4 and GPT4 were worse, but like, at a certain point you don't get brownie points for not tripping over how low the bar is set.