Depends on what you expect from the model. For coding/agentic tasks there is SWE Bench https://www.swebench.com/ which gives a better picture. MiniMax, GLM and Kimi K2 seem to be better models for this purpose than Qwen. And it matches my (limited) actual experience.