AI researcher François Chollet is co-founding a nonprofit to build benchmarks for AGI

Former Google engineer and influential AI researcher François Chollet is co-founding a nonprofit to help develop benchmarks that’ll probe AI for “human-level” intelligence.

The nonprofit, the ARC Prize Foundation, will be led by Greg Kamradt, an ex-Salesforce engineering director and founder of the AI product studio Leverage. Kamradt will serve as president and a member of the board.

“[W]e’re growing … into a proper nonprofit foundation to act as a useful north star toward artificial general intelligence,” Chollet wrote in a post on the nonprofit’s website. (Artificial general intelligence is a nebulous term, but it’s commonly understood to mean AI that can perform most tasks humans can.) “[W]e are trying to inspire progress by promoting [the gap] in basic human capability.”

The ARC Prize Foundation will expand on ARC-AGI, a test developed by Chollet to evaluate whether an AI system can efficiently acquire new skills outside the data it was trained on.

Chollet introduced ARC-AGI, short for “Abstract and Reasoning Corpus for Artificial General Intelligence,” in 2019. Many AI systems can ace Math Olympiad exams and figure out potential solutions to PhD-level problems. But until this year, the best-performing AI could only solve just under a third of the tasks in ARC-AGI.

“Unlike most frontier AI benchmarks, we are not trying to measure AI risk with superhuman exam questions,” Chollet wrote in the post. “Future versions of the ARC-AGI benchmark will focus on shrinking [the human capability] gap towards zero.”

ARC-AGI consists of puzzle-like problems where an AI has to generate the correct “answer” grid from a collection of different-colored squares. The problems were designed to force an AI to adapt to new problems it hasn’t seen before.

Last June, Chollet and Zapier co-founder Mike Knoop kicked off a competition to build an AI capable of besting ARC-AGI. OpenAI’s unreleased o3 model was the first to achieve a qualifying score — but only with an extraordinary amount of computing power.

Chollet has made it clear that ARC-AGI has flaws — many models have been able to brute force their way to high scores — and that he doesn’t believe that o3 possess human-level intelligence.

“[E]arly data points suggest that the upcoming [successor to the ARC-AGI] benchmark will still pose a significant challenge to o3, potentially reducing its score to under 30% even at high compute (while a smart human would still be able to score over 95% with no training),” Chollet said in a statement last December. “You’ll know artificial general intelligence is here when the exercise of creating tasks that are easy for regular humans but hard for AI becomes simply impossible.”

Knoop says that the plan is to launch a second-gen ARC-AGI benchmark this year alongside a new competition. The nonprofit will also embark on designing the third edition of ARC-AGI.

It remains to be seen how the ARC Prize Foundation addresses the criticism Chollet has faced for overselling ARC-AGI as a benchmark toward reaching AGI. The very definition of AGI is being hotly contested now; one OpenAI staff member recently claimed that AGI has “already” been achieved if one defines AGI as AI “better than most humans at most tasks.”

Interestingly, OpenAI CEO Sam Altman said in December that the company intends to partner with the ARC-AGI team to build future benchmarks. Chollet gave no update on possible partnership in today’s announcement.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签