AI News 03月26日 00:47
ARC Prize launches its toughest AI benchmark yet: ARC-AGI-2
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

ARC Prize 发布了具有挑战性的 ARC-AGI-2 基准测试,并宣布了 2025 年比赛,总奖金高达 100 万美元。该基准测试旨在衡量人工智能在解决人类擅长但 AI 难以完成的任务方面的能力,从而促进通用人工智能的发展。与以往专注于超人能力的基准不同,ARC-AGI-2 侧重于 AI 的适应性和效率,旨在缩小人类与机器之间的差距,并激励研究人员探索创新的解决方案,推动 AI 领域取得突破。

🧠 ARC-AGI-2 旨在衡量 AI 的适应性和效率,挑战 AI 在人类擅长但难以完成的任务上的表现。该基准测试的设计理念是选择对人类来说相对容易但对 AI 来说困难或不可能的任务。

🧩 ARC-AGI-2 关注 AI 在符号解释、组合推理和上下文规则应用方面的能力。AI 在这些方面表现不佳,例如难以理解符号的语义,应用多重规则,以及根据复杂上下文应用不同规则。

💰 ARC Prize 2025 比赛总奖金为 100 万美元,设有多个奖项类别,包括 70 万美元的大奖,用于在 Kaggle 效率限制内达到 85% 的成功率。比赛旨在激励研究人员开发更高效的 AI 系统,推动在 ARC-AGI-2 挑战中的进步。

💡 ARC Prize 强调效率是衡量 AI 智能的关键因素。现实世界的例子表明,人类在 ARC-AGI-2 任务上的效率远高于当前的 AI 系统,这突出了在适应性和资源消耗方面的差距。

ARC Prize has launched the hardcore ARC-AGI-2 benchmark, accompanied by the announcement of their 2025 competition with $1 million in prizes.

As AI progresses from performing narrow tasks to demonstrating general, adaptive intelligence, the ARC-AGI-2 challenges aim to uncover capability gaps and actively guide innovation.

“Good AGI benchmarks act as useful progress indicators. Better AGI benchmarks clearly discern capabilities. The best AGI benchmarks do all this and actively inspire research and guide innovation,” the ARC Prize team states.

ARC-AGI-2 is setting out to achieve the “best” category.

Beyond memorisation

Since its inception in 2019, ARC Prize has served as a “North Star” for researchers striving toward AGI by creating enduring benchmarks. 

Benchmarks like ARC-AGI-1 leaned into measuring fluid intelligence (i.e., the ability to adapt learning to new unseen tasks.) It represented a clear departure from datasets that reward memorisation alone.

ARC Prize’s mission is also forward-thinking, aiming to accelerate timelines for scientific breakthroughs. Its benchmarks are designed not just to measure progress but to inspire new ideas.

Researchers observed a critical shift with the debut of OpenAI’s o3 in late 2024, evaluated using ARC-AGI-1. Combining deep learning-based large language models (LLMs) with reasoning synthesis engines, o3 marked a breakthrough where AI transitioned beyond rote memorisation.

Yet, despite progress, systems like o3 remain inefficient and require significant human oversight during training processes. To challenge these systems for true adaptability and efficiency, ARC Prize introduced ARC-AGI-2.

ARC-AGI-2: Closing the human-machine gap

The ARC-AGI-2 benchmark is tougher for AI yet retains its accessibility for humans. While frontier AI reasoning systems continue to score in single-digit percentages on ARC-AGI-2, humans can solve every task in under two attempts.

So, what sets ARC-AGI apart? Its design philosophy chooses tasks that are “relatively easy for humans, yet hard, or impossible, for AI.”

The benchmark includes datasets with varying visibility and the following characteristics:

Most existing benchmarks focus on superhuman capabilities, testing advanced, specialised skills at scales unattainable for most individuals. 

ARC-AGI flips the script and highlights what AI can’t yet do; specifically the adaptability that defines human intelligence. When the gap between tasks that are easy for humans but difficult for AI eventually reaches zero, AGI can be declared achieved.

However, achieving AGI isn’t limited to the ability to solve tasks; efficiency – the cost and resources required to find solutions – is emerging as a crucial defining factor.

The role of efficiency

Measuring performance by cost per task is essential to gauge intelligence as not just problem-solving capability but the ability to do so efficiently.

Real-world examples are already showing efficiency gaps between humans and frontier AI systems:

These metrics underline disparities in adaptability and resource consumption between humans and AI. ARC Prize has committed to reporting on efficiency alongside scores across future leaderboards.

The focus on efficiency prevents brute-force solutions from being considered “true intelligence.”

Intelligence, according to ARC Prize, encompasses finding solutions with minimal resources—a quality distinctly human but still elusive for AI.

ARC Prize 2025

ARC Prize 2025 launches on Kaggle this week, promising $1 million in total prizes and showcasing a live leaderboard for open-source breakthroughs. The contest aims to drive progress toward systems that can efficiently tackle ARC-AGI-2 challenges. 

Among the prize categories, which have increased from 2024 totals, are:

These incentives ensure fair and meaningful progress while fostering collaboration among researchers, labs, and independent teams.

Last year, ARC Prize 2024 saw 1,500 competitor teams, resulting in 40 papers of acclaimed industry influence. This year’s increased stakes aim to nurture even greater success.

ARC Prize believes progress hinges on novel ideas rather than merely scaling existing systems. The next breakthrough in efficient general systems might not originate from current tech giants but from bold, creative researchers embracing complexity and curious experimentation.

(Image credit: ARC Prize)

See also: DeepSeek V3-0324 tops non-reasoning AI models in open-source first

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post ARC Prize launches its toughest AI benchmark yet: ARC-AGI-2 appeared first on AI News.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

ARC Prize ARC-AGI-2 人工智能 AI基准测试 通用人工智能
相关文章