MarkTechPost@AI 2024年08月16日
Salesforce AI Research Proposes DEI: AI Software Engineering Agents Org, Achieving a 34.3% Resolve Rate on SWE-Bench Lite, Crushing Closed-Source Systems
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Salesforce AI 研究团队和卡内基梅隆大学的研究人员提出了一种名为“多样性赋能智能”(DEI)的框架,该框架旨在通过整合多个软件工程代理的独特优势,创建一个更统一、强大、协作的解决问题实体。DEI 框架作为一个元模块,可以应用于现有的软件工程代理,使它们能够以协调的方式协同工作。通过指导和管理这些协作工作,DEI 极大地提高了所有代理解决复杂软件问题的能力,优于任何单个代理可能独自完成的工作。

🤖 DEI 框架基于对各种软件工程代理在解决方案中实施的评估,并根据评估结果,选择解决方案在所提供环境和当前问题中的有效性。通过重新排序管道,实现了对任何补丁的选择和应用,以最大程度地发挥其潜力。DEI 的独特之处在于它利用了来自不同代理的多样化专业知识,使它们能够以更高的准确度解决更广泛的问题。

💪 DEI 框架的性能已通过专门为 SWE-Bench Lite 中的软件工程代理解决现实世界 GitHub 问题的能力而设计的基准测试进行了全面测试。根据这些测试中获得的性能结果,性能令人惊叹。DEI 指导委员会的解决率为 34.3%,比 SWE-Bench Lite 基准测试中最佳单个代理的性能提高了 25%,达到 27.3%。

🚀 在 DEI 上工作的最佳表现小组实现了 55% 的解决率,这是 SWE-Bench Lite 认可的最高值。这种性能确实超越了单个代理和许多封闭源系统所能做到的,展示了这些协作 AI 系统的巨大潜力。

💡 DEI 框架整合了多个软件工程代理的多样化能力,有效地解决了在大型代码库中解决复杂软件问题所面临的挑战。该框架通过协作和重新排序来提高性能的能力已通过广泛的测试得到证实,在测试中它取得了显着的成果,包括在 SWE-Bench Lite 上实现了 34.3% 的解决率和 55% 的峰值性能。这些发现强调了 AI 系统中多样性的重要性,因为它可以带来更大的创新、效率和软件工程中的问题解决能力。

Software engineering has undergone this large transformation to automate tasks, particularly through large language models. This may concern generating code or tests checking for bugs, an activity traditionally done by human engineers. Now, AI-driven agents based on LLMs would understand and produce human-like text, carrying out complex operations in software development. However, the full potential of such AI agents was never harnessed because their capabilities were usually narrowed down to just one task, giving a fragmented solution to software engineering challenges.

The challenge in software engineering is debugging an issue in a large codebase, such as the ones on GitHub. Codebases are huge and very complex, which makes it very difficult to understand how the software was designed and how it is functioning. SWE agents were developed to address these issues automatically by automatically generating bug patches. The task is cumbersome because of the need to navigate large code repos and complex interactions between functions. In the end, it gives an accurate fix. Up-to-date, each artificial intelligence for each agent has yet to show mastery of every aspect of these tasks, often yielding suboptimal sometimes and inconsistent results.

Several researchers have developed several AI-based agents that bestow special emphasis on different aspects of software issue resolution. Some are very good at reproducing bugs in a development environment to understand the problem better, while others specialize in patch generation or code review. The problem is that these agents usually operate in isolation and offer limited success. Without a framework for collaboration, it would then enforce fully diverse strengths from these agents, leading to bottlenecks and missed opportunities for problem-solving efficiency.

Researchers from the Salesforce AI Research team and Carnegie Mellon University proposed the Diversity Empowered Intelligence (DEI) framework. DEI is a framework designed to encompass multiple software engineering agents leveraging unique strengths toward a more unified, powerful, cohesive problem-solving entity. This framework functions as a meta-module applicable to existing SWE agents, allowing them to cooperate in a coordinated manner. In guiding and managing these collaborative efforts, DEI greatly improves the ability of all agents to solve complex software problems compared to any agent possibly doing it by itself.

The DEI framework works based on the evaluation that the various agents of software engineering implement in the solutions they provide, and based on that, they choose the solution’s effectiveness in the provided context and for the prevailing problem. This has been made possible through the re-ranking pipeline implemented so that the selection and application of any patch are done to the best of the possibilities. DEI is a scheme that particularly works because its diverse expertise from different agents enables them to solve an area of wider problems with a much higher level of accuracy. First and foremost, the framework is scalable in a way that has been carefully designed to integrate with any existing SWE agent framework, consequently fostering a more collaborative and efficient software engineering environment.

DEI framework performance has been exhaustively tested with a benchmark specially designed within SWE-Bench Lite to evaluate the capability of software engineering agents in finding a solution to real-world GitHub issues. From the results of the performance obtained in these tests, the performance is simply astonishing. With a 27.3% resolution rate maximized across SWE-Bench Lite benchmarks, the best individual agent performance is increased by 25%, with the DEI-guided committee of 34.3% to solve. The best-performing group working on DEI achieved a resolution rate of 55%, the highest value recognized by SWE-Bench Lite. This performance, indeed, surpasses what can be done by single agents and many closed-source systems, exhibiting great potential with these collaborative AI systems.

In conclusion, the Diversity Empowered Intelligence (DEI) framework integrates the diverse capabilities of multiple SWE agents and effectively addresses the challenges of resolving complex software issues in large codebases. The framework’s ability to enhance performance through collaboration and re-ranking has been proven through extensive testing, where it achieved notable results, including a 34.3% resolve rate and a 55% peak performance on SWE-Bench Lite. These findings underscore the importance of diversity in AI systems, as it leads to greater innovation, efficiency, and problem-solving capabilities in software engineering.


Check out the Paper and Project Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here


The post Salesforce AI Research Proposes DEI: AI Software Engineering Agents Org, Achieving a 34.3% Resolve Rate on SWE-Bench Lite, Crushing Closed-Source Systems appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Salesforce AI DEI 软件工程 AI 代理 协作 AI
相关文章