TechCrunch News 02月08日
DeepMind claims its AI performs better than International Mathematical Olympiad gold medalists
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

谷歌DeepMind开发的AlphaGeometry2系统在解决国际数学奥林匹克竞赛几何问题方面超越了平均金牌获得者。该系统是AlphaGeometry的改进版,能够解决过去25年IMO中84%的几何问题。DeepMind认为,解决复杂的几何问题是实现更强大AI的关键。AlphaGeometry2结合了Gemini语言模型和符号引擎,Gemini模型预测有用的构造,符号引擎使用数学规则推理解题。DeepMind创建了合成数据来训练AlphaGeometry2的语言模型。实验结果表明,AlphaGeometry2在解决IMO问题方面表现出色,但仍存在局限性,且符号操作与神经网络的混合方法是实现通用AI的有希望的途径。

🥇AlphaGeometry2是谷歌DeepMind开发的AI系统,在国际数学奥林匹克竞赛中解决几何问题的能力超越了人类平均金牌获得者。

🧩该系统结合了Gemini语言模型和符号引擎,Gemini模型预测有用的构造,符号引擎使用数学规则推理解题,实现了高效的问题求解。

📚DeepMind创建了超过3亿个定理和证明的合成数据,用于训练AlphaGeometry2的语言模型,克服了训练数据匮乏的难题。

💡AlphaGeometry2采用符号操作与神经网络相结合的混合方法,这被认为是实现通用AI的一条有希望的路径。

An AI system developed by Google DeepMind, Google’s leading AI research lab, appears to have surpassed the average gold medalist in solving geometry problems in an international mathematics competition.

The system, called AlphaGeometry2, is an improved version of a system, AlphaGeometry, that DeepMind released last January. In a newly published study, the DeepMind researchers behind AlphaGeometry2 claim their AI can solve 84% of all geometry problems over the last 25 years in the International Mathematical Olympiad (IMO), a math contest for high school students.

Why does DeepMind care about a high-school-level math competition? Well, the lab thinks the key to more capable AI might lie in discovering new ways to solve challenging geometry problems — specifically Euclidean geometry problems.

Proving mathematical theorems, or logically explaining why a theorem (e.g. the Pythagorean theorem) is true, requires both reasoning and the ability to choose from a range of possible steps toward a solution. These problem-solving skills could — if DeepMind’s right — turn out to be a useful component of future general-purpose AI models.

Indeed, this past summer, DeepMind demoed a system that combined AlphaGeometry2 with AlphaProof, an AI model for formal math reasoning, to solve four out of six problems from the 2024 IMO. In addition to geometry problems, approaches like these could be extended to other areas of math and science — for example, to aid with complex engineering calculations.

AlphaGeometry2 has several core elements, including a language model from Google’s Gemini family of AI models and a “symbolic engine.” The Gemini model helps the symbolic engine, which uses mathematical rules to infer solutions to problems, arrive at feasible proofs for a given geometry theorem.

A typical geometry problem diagram in an IMO exam.Image Credits:Google (opens in a new window)

Olympiad geometry problems are based on diagrams that need “constructs” to be added before they can be solved, such as points, lines, or circles. AlphaGeometry2’s Gemini model predicts which constructs might be useful to add to a diagram, which the engine references to make deductions.

Basically, AlphaGeometry2’s Gemini model suggests steps and constructions in a formal mathematical language to the engine, which — following specific rules — checks these steps for logical consistency. A search algorithm allows AlphaGeometry2 to conduct multiple searches for solutions in parallel and store possibly useful findings in a common knowledge base.

AlphaGeometry2 considers a problem to be “solved” when it arrives at a proof that combines the Gemini model’s suggestions with the symbolic engine’s known principles.

Owing to the complexities of translating proofs into a format AI can understand, there’s a dearth of usable geometry training data. So DeepMind created its own synthetic data to train AlphaGeometry2’s language model, generating over 300 million theorems and proofs of varying complexity.

The DeepMind team selected 45 geometry problems from IMO competitions over the past 25 years (from 2000 to 2024), including linear equations and equations that require moving geometric objects around a plane. They then “translated” these into a larger set of 50 problems. (For technical reasons, some problems had to be split into two.)

According to the paper, AlphaGeometry2 solved 42 out of the 50 problems, clearing the average gold medalist score of 40.9.

Granted, there are limitations. A technical quirk prevents AlphaGeometry2 from solving problems with a variable number of points, nonlinear equations, and inequalities. And AlphaGeometry2 isn’t technically the first AI system to reach gold-medal-level performance in geometry, although it’s the first to achieve it with a problem set of this size.

AlphaGeometry2 also did worse on another set of harder IMO problems. For an added challenge, the DeepMind team selected problems — 29 in total — that had been nominated for IMO exams by math experts, but that haven’t yet appeared in a competition. AlphaGeometry2 could only solve 20 of these.

Still, the study results are likely to fuel the debate over whether AI systems should be built on symbol manipulation — that is, manipulating symbols that represent knowledge using rules — or the ostensibly more brain-like neural networks.

AlphaGeometry2 adopts a hybrid approach: Its Gemini model has a neural network architecture, while its symbolic engine is rules-based.

Proponents of neural network techniques argue that intelligent behavior, from speech recognition to image generation, can emerge from nothing more than massive amounts of data and computing. Opposed to symbolic systems, which solve tasks by defining sets of symbol-manipulating rules dedicated to particular jobs, like editing a line in word processor software, neural networks try to solve tasks through statistical approximation and learning from examples. 

Neural networks are the cornerstone of powerful AI systems like OpenAI’s o1 “reasoning” model. But, claim supporters of symbolic AI, they’re not the end-all-be-all; symbolic AI might be better positioned to efficiently encode the world’s knowledge, reason their way through complex scenarios, and “explain” how they arrived at an answer, these supporters argue.

“It is striking to see the contrast between continuing, spectacular progress on these kinds of benchmarks, and meanwhile, language models, including more recent ones with ‘reasoning,’ continuing to struggle with some simple commonsense problems,” Vince Conitzer, a Carnegie Mellon University computer science professor specializing in AI, told TechCrunch. “I don’t think it’s all smoke and mirrors, but it illustrates that we still don’t really know what behavior to expect from the next system. These systems are likely to be very impactful, so we urgently need to understand them and the risks they pose much better.”

AlphaGeometry2 perhaps demonstrates that the two approaches — symbol manipulation and neural networks — combined are a promising path forward in the search for generalizable AI. Indeed, according to the DeepMind paper, o1, which also has a neural network architecture, couldn’t solve any of the IMO problems that AlphaGeometry2 was able to answer.

This may not be the case forever. In the paper, the DeepMind team said it found preliminary evidence that AlphaGeometry2’s language model was capable of generating partial solutions to problems without the help of the symbolic engine.

“[The] results support ideas that large language models can be self-sufficient without depending on external tools [like symbolic engines],” the DeepMind team wrote in the paper, “but until [model] speed is improved and hallucinations are completely resolved, the tools will stay essential for math applications.”

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AlphaGeometry2 DeepMind 人工智能 数学竞赛 几何问题
相关文章