TechCrunch News 01月09日
AI researcher François Chollet is co-founding a nonprofit to build benchmarks for AGI
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

前谷歌工程师François Chollet联合创立了ARC Prize基金会,旨在通过开发基准测试来评估人工智能的“人类水平”智能。该基金会由Greg Kamradt领导,将扩展Chollet开发的ARC-AGI测试,该测试旨在衡量AI系统在未训练数据之外获取新技能的能力。ARC-AGI包含一系列谜题式问题,要求AI从不同颜色的方块中生成正确的“答案”网格,以测试AI的适应性。尽管OpenAI的o3模型在ARC-AGI上取得了进展,但Chollet认为它不具备人类水平的智能。基金会计划推出第二代ARC-AGI基准测试,并继续设计第三代,以推动通用人工智能的发展。

🏆ARC Prize基金会由前谷歌工程师François Chollet联合创立,专注于开发评估人工智能“人类水平”智能的基准测试。

🧩该基金会将扩展ARC-AGI测试,该测试通过谜题式问题评估AI在未训练数据之外获取新技能的能力,旨在衡量AI的适应性和通用性。

🤖尽管OpenAI的o3模型在ARC-AGI测试中表现出一定的能力,但Chollet认为它仍未达到人类水平的智能,并指出该测试存在被“暴力破解”的缺陷。

🚀基金会计划推出第二代ARC-AGI基准测试,并着手设计第三代,持续推动通用人工智能的发展,并计划与OpenAI合作。

🤔ARC-AGI的定义和价值受到了挑战,关于AGI是否已经实现以及如何定义AGI的讨论仍在进行中,未来仍需进一步探索。

Former Google engineer and influential AI researcher François Chollet is co-founding a nonprofit to help develop benchmarks that’ll probe AI for “human-level” intelligence.

The nonprofit, the ARC Prize Foundation, will be led by Greg Kamradt, an ex-Salesforce engineering director and founder of the AI product studio Leverage. Kamradt will serve as president and a member of the board.

“[W]e’re growing … into a proper nonprofit foundation to act as a useful north star toward artificial general intelligence,” Chollet wrote in a post on the nonprofit’s website. (Artificial general intelligence is a nebulous term, but it’s commonly understood to mean AI that can perform most tasks humans can.) “[W]e are trying to inspire progress by promoting [the gap] in basic human capability.”

The ARC Prize Foundation will expand on ARC-AGI, a test developed by Chollet to evaluate whether an AI system can efficiently acquire new skills outside the data it was trained on.

Chollet introduced ARC-AGI, short for “Abstract and Reasoning Corpus for Artificial General Intelligence,” in 2019. Many AI systems can ace Math Olympiad exams and figure out potential solutions to PhD-level problems. But until this year, the best-performing AI could only solve just under a third of the tasks in ARC-AGI.

“Unlike most frontier AI benchmarks, we are not trying to measure AI risk with superhuman exam questions,” Chollet wrote in the post. “Future versions of the ARC-AGI benchmark will focus on shrinking [the human capability] gap towards zero.”

ARC-AGI consists of puzzle-like problems where an AI has to generate the correct “answer” grid from a collection of different-colored squares. The problems were designed to force an AI to adapt to new problems it hasn’t seen before.

Last June, Chollet and Zapier co-founder Mike Knoop kicked off a competition to build an AI capable of besting ARC-AGI. OpenAI’s unreleased o3 model was the first to achieve a qualifying score — but only with an extraordinary amount of computing power.

Chollet has made it clear that ARC-AGI has flaws — many models have been able to brute force their way to high scores — and that he doesn’t believe that o3 possess human-level intelligence.

“[E]arly data points suggest that the upcoming [successor to the ARC-AGI] benchmark will still pose a significant challenge to o3, potentially reducing its score to under 30% even at high compute (while a smart human would still be able to score over 95% with no training),” Chollet said in a statement last December. “You’ll know artificial general intelligence is here when the exercise of creating tasks that are easy for regular humans but hard for AI becomes simply impossible.”

Knoop says that the plan is to launch a second-gen ARC-AGI benchmark this year alongside a new competition. The nonprofit will also embark on designing the third edition of ARC-AGI.

It remains to be seen how the ARC Prize Foundation addresses the criticism Chollet has faced for overselling ARC-AGI as a benchmark toward reaching AGI. The very definition of AGI is being hotly contested now; one OpenAI staff member recently claimed that AGI has “already” been achieved if one defines AGI as AI “better than most humans at most tasks.”

Interestingly, OpenAI CEO Sam Altman said in December that the company intends to partner with the ARC-AGI team to build future benchmarks. Chollet gave no update on possible partnership in today’s announcement.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人工智能 AGI ARC-AGI 基准测试 François Chollet
相关文章