Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Depends on what you expect from the model. For coding/agentic tasks there is SWE Bench https://www.swebench.com/ which gives a better picture. MiniMax, GLM and Kimi K2 seem to be better models for this purpose than Qwen. And it matches my (limited) actual experience.
 help



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: