Gandalf_The_Grey
Level 85
Thread author
Verified
Honorary Member
Top Poster
Content Creator
Well-known
- Apr 24, 2016
- 7,601
The news headlines for the last week have been dominated by DeepSeek thanks to the launch of its new reasoning model, R1, which improves responses to queries. DeepSeek's main non-reasoning model, DeepSeek-V3 arrived in December with impressive benchmark scores of its own, but now, Chinese firm Alibaba has released Qwen2.5-Max which surpasses DeepSeek-V3, and in some tests GPT-4o-0806 and Claude-3.5-Sonnet-1022.
Similar to DeepSeek, Qwen2.5-Max is touchy about Chinese political issues, it doesn't even answer those questions, on Qwen Chat, it just says you've exceeded your quota limit when you try those queries, but answers fine when you change the topic.
Some benchmarks that Alibaba used to test its model against the competition included MMLU-Pro, which tests knowledge through college-level problems, LiveCodeBench, which assesses coding capabilities, LiveBench, which comprehensively tests the general capabilities, and Arena-Hard, which approximates human preferences.
![www.neowin.net](https://cdn.neowin.com/news/images/uploaded/2025/01/1738149379_qwen2.5-max-banner_story.jpg)
Move over DeepSeek: Alibaba's Qwen2.5-Max surpasses DeepSeek-V3 in benchmarks
Alibaba has just released its latest model, Qwen2.5-Max. It surpasses DeepSeek-V3 in many benchmarks and can be tested out now on Qwen Chat.
![www.neowin.net](https://www.neowin.net/images/orion/icons/favicon-196x196.png)