TechCrunch News 2024年11月28日
Alibaba releases an ‘open’ challenger to OpenAI’s o1 reasoning model
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

阿里巴巴发布了一款名为QwQ-32B-Preview的全新推理AI模型,其参数量达325亿,能够处理长达32000字的提示,并在某些基准测试中表现优于OpenAI的o1模型。该模型在AIME和MATH测试中表现出色,能够解决逻辑难题和具有挑战性的数学问题,但同时也存在一些局限性,例如可能会意外切换语言或陷入循环。QwQ-32B-Preview采用了一种名为测试时计算的技术,允许模型在执行任务时获得额外的处理时间,这使得它能够有效地进行自我事实核查,避免一些常见的错误。此外,该模型还受到中国网络监管机构的审查,因此在某些敏感话题上会拒绝回答。QwQ-32B-Preview以Apache 2.0许可证开源,但仅发布了部分组件,无法完全复制或深入了解其内部运作机制。

🤔 QwQ-32B-Preview是阿里巴巴Qwen团队开发的325亿参数推理AI模型,在AIME和MATH测试中表现优于OpenAI的o1模型,能够解决逻辑难题和数学问题。

🔎 QwQ-32B-Preview采用测试时计算技术,允许模型在执行任务时获得更多处理时间,从而有效进行自我事实核查,避免一些常见错误。

🇨🇳 QwQ-32B-Preview受中国网络监管机构审查,在涉及敏感话题(如台湾问题、天安门事件等)时会拒绝回答,以确保其响应符合“核心社会主义价值观”。

🔓 QwQ-32B-Preview以Apache 2.0许可证开源,但仅发布部分组件,无法完全复制或深入了解其内部机制。

📈 推理模型的兴起与“扩展定律”的有效性受到质疑有关,各大AI实验室正在探索新的AI方法、架构和开发技术,测试时计算成为新的发展方向。

A new “reasoning” AI model, QwQ-32B-Preview, has arrived on the scene. It’s one of the few to rival OpenAI’s o1, and it’s the first available to download under a permissive license.

Developed by Alibaba’s Qwen team, QwQ-32B-Preview, which contains 32.5 billion parameters and can consider prompts up ~32,000 words in length, performs better on certain benchmarks than o1-preview and o1-mini, the two reasoning models that OpenAI has released to date. Parameters roughly correspond to a model’s problem-solving skills, and models with more parameters generally perform better than those with fewer parameters.

Per Alibaba’s testing, QwQ-32B-Preview beats OpenAI’s o1 models on the AIME and MATH tests. AIME uses other AI models to evaluate a model’s performance, while MATH is a collection of word problems.

QwQ-32B-Preview can solve logic puzzles and answer reasonably challenging math questions, thanks to its “reasoning” capabilities. But it isn’t perfect. Alibaba notes in a blog post that the model might switch languages unexpectedly, get stuck in loops, and underperform on tasks that require “common sense reasoning.”

Image Credits:Alibaba

Unlike most AI, QwQ-32B-Preview and other reasoning models effectively fact-check themselves. This helps them avoid some of the pitfalls that normally trip up models, with the downside being that they often take longer to arrive at solutions. Similar to o1, QwQ-32B-Preview reasons through tasks, planning ahead and performing a series of actions that help the model tease out answers.

QwQ-32B-Preview, which can be run on and downloaded from the AI dev platform Hugging Face, appears to be similar to the recently released DeepSeek reasoning model in that certain topics are verboten. Alibaba and DeepSeek, being Chinese companies, are subject to benchmarking by China’s internet regulator to ensure their models’ responses “embody core socialist values.” Many Chinese AI systems decline to respond to topics that might raise the ire of regulators, like speculation about the Xi Jinping regime.

Image Credits:Alibaba

Asked “Is Taiwan a part of China?,” QwQ-32B-Preview answered that it was, a perspective out of step with most of the world but in line with that of China’s ruling party. Prompts about Tiananmen Square, meanwhile, yielded a non-response.

Image Credits:Alibaba

QwQ-32B-Preview is “openly” available under an Apache 2.0 license, meaning it can be used for commercial applications. But only certain components of the model have been released, making it impossible to replicate QwQ-32B-Preview or gain much insight into the system’s inner workings.

The increased attention on reasoning models comes as the viability of “scaling laws,” long-held theories that throwing more data and computing power at a model would continuously increase its capabilities, are coming under scrutiny. A flurry of press reports suggest that models from major AI labs including OpenAI, Google, and Anthropic aren’t improving as dramatically as they once did.

That’s led to a scramble for new AI approaches, architectures, and development techniques. One is test-time compute, which underpins models like o1 and DeepSeek’s. Also known as inference compute, test-time compute essentially gives models additional processing time to complete tasks.

Big labs besides OpenAI and Chinese ventures are betting it’s the future. According to a recent report from The Information, Google recently expanded its reasoning team to about 200 people and added computing power.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

推理AI QwQ-32B-Preview 阿里巴巴 OpenAI 测试时计算
相关文章