A.I. News Move over DeepSeek: Alibaba's Qwen2.5-Max surpasses DeepSeek-V3 in benchmarks

Gandalf_The_Grey

Level 85
Thread author
Verified
Honorary Member
Top Poster
Content Creator
Well-known
Apr 24, 2016
7,601
The news headlines for the last week have been dominated by DeepSeek thanks to the launch of its new reasoning model, R1, which improves responses to queries. DeepSeek's main non-reasoning model, DeepSeek-V3 arrived in December with impressive benchmark scores of its own, but now, Chinese firm Alibaba has released Qwen2.5-Max which surpasses DeepSeek-V3, and in some tests GPT-4o-0806 and Claude-3.5-Sonnet-1022.

Similar to DeepSeek, Qwen2.5-Max is touchy about Chinese political issues, it doesn't even answer those questions, on Qwen Chat, it just says you've exceeded your quota limit when you try those queries, but answers fine when you change the topic.

Some benchmarks that Alibaba used to test its model against the competition included MMLU-Pro, which tests knowledge through college-level problems, LiveCodeBench, which assesses coding capabilities, LiveBench, which comprehensively tests the general capabilities, and Arena-Hard, which approximates human preferences.
 

About us

  • MalwareTips is a community-driven platform providing the latest information and resources on malware and cyber threats. Our team of experienced professionals and passionate volunteers work to keep the internet safe and secure. We provide accurate, up-to-date information and strive to build a strong and supportive community dedicated to cybersecurity.

User Menu

Follow us

Follow us on Facebook or Twitter to know first about the latest cybersecurity incidents and malware threats.

Top