AI News 01月29日
Qwen 2.5-Max outperforms DeepSeek V3 in some benchmarks
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

阿里巴巴发布了其最新的MoE大型模型Qwen 2.5-Max,该模型在超过20万亿的tokens上进行了预训练,并通过监督微调(SFT)和人类反馈强化学习(RLHF)等先进技术进行了微调。该模型在多项基准测试中表现出色,尤其是在Arena-Hard、LiveBench、LiveCodeBench和GPQA-Diamond等基准测试中优于DeepSeek V3。Qwen 2.5-Max的API已通过阿里云提供,并且可以通过Qwen Chat平台进行探索。阿里巴巴希望通过该模型提升人工智能系统的基本思考和推理能力,并致力于推动强化学习的边界,以实现超越人类智能的复杂问题解决能力。

🚀Qwen 2.5-Max模型在超过20万亿tokens上预训练,并采用SFT和RLHF技术进行微调,展现了强大的模型基础。

📊在多项基准测试中,Qwen 2.5-Max在Arena-Hard、LiveBench、LiveCodeBench和GPQA-Diamond等关键测试中超越了DeepSeek V3,并在MMLU-Pro等其他评估中表现出竞争力。

☁️Qwen 2.5-Max的API已通过阿里云提供,开发者可通过Alibaba Cloud账户和Model Studio服务获取API密钥,且该API与OpenAI的生态系统兼容,方便现有项目集成。

💬用户可以通过Qwen Chat平台直接与Qwen 2.5-Max模型互动,探索其搜索能力和复杂查询理解能力,这体现了阿里巴巴对模型可访问性的重视。

Alibaba’s response to DeepSeek is Qwen 2.5-Max, the company’s latest Mixture-of-Experts (MoE) large-scale model.

Qwen 2.5-Max boasts pretraining on over 20 trillion tokens and fine-tuning through cutting-edge techniques like Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF).

With the API now available through Alibaba Cloud and the model accessible for exploration via Qwen Chat, the Chinese tech giant is inviting developers and researchers to see its breakthroughs firsthand.

Outperforming peers  

When comparing Qwen 2.5-Max’s performance against some of the most prominent AI models on a variety of benchmarks, the results are promising.

Evaluations included popular metrics like the MMLU-Pro for college-level problem-solving, LiveCodeBench for coding expertise, LiveBench for overall capabilities, and Arena-Hard for assessing models against human preferences.

According to Alibaba, “Qwen 2.5-Max outperforms DeepSeek V3 in benchmarks such as Arena-Hard, LiveBench, LiveCodeBench, and GPQA-Diamond, while also demonstrating competitive results in other assessments, including MMLU-Pro.”

(Credit: Alibaba)

The instruct model – designed for downstream tasks like chat and coding – competes directly with leading models such as GPT-4o, Claude-3.5-Sonnet, and DeepSeek V3. Among these, Qwen 2.5-Max managed to outperform rivals in several key areas.

Comparisons of base models also yielded promising outcomes. While proprietary models like GPT-4o and Claude-3.5-Sonnet remained out of reach due to access restrictions, Qwen 2.5-Max was assessed against leading public options such as DeepSeek V3, Llama-3.1-405B (the largest open-weight dense model), and Qwen2.5-72B. Again, Alibaba’s newcomer demonstrated exceptional performance across the board.

“Our base models have demonstrated significant advantages across most benchmarks,” Alibaba stated, “and we are optimistic that advancements in post-training techniques will elevate the next version of Qwen 2.5-Max to new heights.”

Making Qwen 2.5-Max accessible  

To make the model more accessible to the global community, Alibaba has integrated Qwen 2.5-Max with its Qwen Chat platform, where users can interact directly with the model in various capacities—whether exploring its search capabilities or testing its understanding of complex queries.  

For developers, the Qwen 2.5-Max API is now available through Alibaba Cloud under the model name “qwen-max-2025-01-25”. Interested users can get started by registering an Alibaba Cloud account, activating the Model Studio service, and generating an API key.  

The API is even compatible with OpenAI’s ecosystem, making integration straightforward for existing projects and workflows. This compatibility lowers the barrier for those eager to test their applications with the model’s capabilities.

Alibaba has made a strong statement of intent with Qwen 2.5-Max. The company’s ongoing commitment to scaling AI models is not just about improving performance benchmarks but also about enhancing the fundamental thinking and reasoning abilities of these systems.  

“The scaling of data and model size not only showcases advancements in model intelligence but also reflects our unwavering commitment to pioneering research,” Alibaba noted.  

Looking ahead, the team aims to push the boundaries of reinforcement learning to foster even more advanced reasoning skills. This, they say, could enable their models to not only match but surpass human intelligence in solving intricate problems.  

The implications for the industry could be profound. As scaling methods improve and Qwen models break new ground, we are likely to see further ripples across AI-driven fields globally that we’ve seen in recent weeks.

(Photo by Maico Amorim)

See also: ChatGPT Gov aims to modernise US government agencies

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Qwen 2.5-Max outperforms DeepSeek V3 in some benchmarks appeared first on AI News.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Qwen 2.5-Max MoE模型 人工智能 深度学习 基准测试
相关文章