MarkTechPost@AI 06月07日 01:40
Darwin Gödel Machine: A Self-Improving AI Agent That Evolves Code Using Foundation Models and Real-World Benchmarks
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了达尔文哥德尔机(DGM),一种能够自主进化的新型AI系统。DGM通过不断修改自身代码,并基于实际编码基准(如SWE-bench和Polyglot)的性能指标进行评估,从而实现自我改进。它利用冻结的基础模型进行代码执行和生成,并通过类似生物进化的方式,保留和优化表现优秀的变体。实验结果显示,DGM在编码任务上取得了显著的性能提升,超越了传统基线。DGM为构建更具适应性的AI系统提供了新的思路,未来有望拓展到更广泛的领域。

💡 传统AI的局限性:传统AI系统受限于静态架构,无法在部署后自主改进。而DGM受人类科学进步的启发,通过代码修改和性能反馈实现持续进化。

⚙️ DGM的工作原理:DGM的核心在于其自修改能力,它通过迭代编辑自身代码来进化。该系统使用冻结的基础模型来辅助代码执行和生成,并根据在SWE-bench和Polyglot等编码基准上的表现来评估不同版本的代码。

📈 实验结果:DGM在SWE-bench上的性能从20.0%提升至50.0%,在Polyglot上的准确率从14.2%提升至30.7%,证明了其在无需人工干预的情况下,也能自主优化其架构和推理策略。

⚠️ 技术意义与局限:DGM将AI改进视为一个搜索问题,通过试错来探索代理架构。虽然计算密集,且目前性能尚未超越专家调优的封闭系统,但它为软件工程及其他领域中开放式AI进化提供了一条可扩展的路径。

Introduction: The Limits of Traditional AI Systems

Conventional artificial intelligence systems are limited by their static architectures. These models operate within fixed, human-engineered frameworks and cannot autonomously improve after deployment. In contrast, human scientific progress is iterative and cumulative—each advancement builds upon prior insights. Taking inspiration from this model of continuous refinement, AI researchers are now exploring evolutionary and self-reflective techniques that allow machines to improve through code modification and performance feedback.

Darwin Gödel Machine: A Practical Framework for Self-Improving AI

Researchers from the Sakana AI, the University of British Columbia and the Vector Institute have introduced the Darwin Gödel Machine (DGM), a novel self-modifying AI system designed to evolve autonomously. Unlike theoretical constructs like the Gödel Machine, which rely on provable modifications, DGM embraces empirical learning. The system evolves by continuously editing its own code, guided by performance metrics from real-world coding benchmarks such as SWE-bench and Polyglot.

Foundation Models and Evolutionary AI Design

To drive this self-improvement loop, DGM uses frozen foundation models that facilitate code execution and generation. It begins with a base coding agent capable of self-editing, then iteratively modifies it to produce new agent variants. These variants are evaluated and retained in an archive if they demonstrate successful compilation and self-improvement. This open-ended search process mimics biological evolution—preserving diversity and enabling previously suboptimal designs to become the basis for future breakthroughs.

Benchmark Results: Validating Progress on SWE-bench and Polyglot

DGM was tested on two well-known coding benchmarks:

These results highlight DGM’s ability to evolve its architecture and reasoning strategies without human intervention. The study also compared DGM with simplified variants that lacked self-modification or exploration capabilities, confirming that both elements are critical for sustained performance improvements. Notably, DGM even outperformed hand-tuned systems like Aider in multiple scenarios.

Technical Significance and Limitations

DGM represents a practical reinterpretation of the Gödel Machine by shifting from logical proof to evidence-driven iteration. It treats AI improvement as a search problem—exploring agent architectures through trial and error. While still computationally intensive and not yet on par with expert-tuned closed systems, the framework offers a scalable path toward open-ended AI evolution in software engineering and beyond.

Conclusion: Toward General, Self-Evolving AI Architectures

The Darwin Gödel Machine shows that AI systems can autonomously refine themselves through a cycle of code modification, evaluation, and selection. By integrating foundation models, real-world benchmarks, and evolutionary search principles, DGM demonstrates meaningful performance gains and lays the groundwork for more adaptable AI. While current applications are limited to code generation, future versions could expand to broader domains—moving closer to general-purpose, self-improving AI systems aligned with human goals.


TL;DR


Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter.

The post Darwin Gödel Machine: A Self-Improving AI Agent That Evolves Code Using Foundation Models and Real-World Benchmarks appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

达尔文哥德尔机 自进化AI 代码生成 SWE-bench Polyglot
相关文章