AI News 2024年08月09日
Qwen2-Math: A new era for AI maths whizzes
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Alibaba Cloud的Qwen团队推出Qwen2-Math,这是一系列专门解决复杂数学问题的大型语言模型,在多种数学任务中表现出色,具有多种优势和未来发展计划。

🎯Qwen2-Math基于现有Qwen2基础构建,能出色解决算术和数学难题,超越前行业领导者。其使用包含多种高质量资源的数学特定语料库进行训练。

📈经过严格评估,Qwen2-Math在中英文数学基准测试中表现优异,旗舰模型Qwen2-Math-72B-Instruct在多种数学任务中超越GPT-4o和Claude 3.5等专有模型。

💪Qwen2-Math在开发过程中有效实施了数学特定奖励模型,在如AIME 2024和AMC 2023等数学竞赛中取得优异成绩,且团队采取严格的去污染方法确保模型的准确性和可靠性。

🌍Qwen团队计划扩展Qwen2-Math的语言能力,推出双语和多语言模型,以使先进的数学问题解决能力能为全球受众所用。

Alibaba Cloud’s Qwen team has unveiled Qwen2-Math, a series of large language models specifically designed to tackle complex mathematical problems.

These new models – built upon the existing Qwen2 foundation – demonstrate remarkable proficiency in solving arithmetic and mathematical challenges, and outperform former industry leaders.

The Qwen team crafted Qwen2-Math using a vast and diverse Mathematics-specific Corpus. This corpus comprises a rich tapestry of high-quality resources, including web texts, books, code, exam questions, and synthetic data generated by Qwen2 itself.

Rigorous evaluation on both English and Chinese mathematical benchmarks – including GSM8K, Math, MMLU-STEM, CMATH, and GaoKao Math – revealed the exceptional capabilities of Qwen2-Math. Notably, the flagship model, Qwen2-Math-72B-Instruct, surpassed the performance of proprietary models such as GPT-4o and Claude 3.5 in various mathematical tasks.

“Qwen2-Math-Instruct achieves the best performance among models of the same size, with RM@8 outperforming Maj@8, particularly in the 1.5B and 7B models,” the Qwen team noted.

This superior performance is attributed to the effective implementation of a math-specific reward model during the development process.

Further showcasing its prowess, Qwen2-Math demonstrated impressive results in challenging mathematical competitions like the American Invitational Mathematics Examination (AIME) 2024 and the American Mathematics Contest (AMC) 2023.

To ensure the model’s integrity and prevent contamination, the Qwen team implemented robust decontamination methods during both the pre-training and post-training phases. This rigorous approach involved removing duplicate samples and identifying overlaps with test sets to maintain the model’s accuracy and reliability.

Looking ahead, the Qwen team plans to expand Qwen2-Math’s capabilities beyond English, with bilingual and multilingual models in the pipeline.  This commitment to inclusivity aims to make advanced mathematical problem-solving accessible to a global audience.

“We will continue to enhance our models’ ability to solve complex and challenging mathematical problems,” affirmed the Qwen team.

You can find the Qwen2 models on Hugging Face here.

See also: Paige and Microsoft unveil next-gen AI models for cancer diagnosis

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Qwen2-Math: A new era for AI maths whizzes appeared first on AI News.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Qwen2-Math 数学问题解决 语言模型 AI数学
相关文章