Transforming Software Development with Multi-Agent Collaboration: CodeStory’s Aide Framework Sets State-of-the-Art on SWE-Bench-Lite with 40.3% Accepted Solutions

MarkTechPost@AI 2024年07月02日

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

CodeStory团队开发的Aide框架在SWE-Bench-Lite基准测试中取得了40.3%的解决方案接受率，创造了新的技术水平。Aide框架采用多智能体协作的方式，将代码分解成原子级的代码符号，由每个智能体负责，并通过自然语言交流和LSP协议进行协作，从而实现高效的代码开发。

👨‍💻 Aide框架将代码分解成原子级的代码符号，如类、函数、枚举或类型，每个智能体负责一个代码符号，并通过自然语言进行交流，实现协作。每个智能体专注于特定任务单元，例如，负责代码编辑的智能体可以优化代码结构，而负责代码导航的智能体可以帮助开发者快速定位代码。这种细粒度的协作方式，能够有效提高代码开发效率。

🤝 Aide框架通过LSP协议进行通信，确保信息传递的准确性和有效性。 LSP协议是一种标准化的通信协议，它能够保证智能体之间能够准确地理解彼此的信息，并进行有效地协作。在实际应用中，Aide框架可以同时运行30个智能体，共同完成代码开发任务。

📈 Aide框架在SWE-Bench-Lite基准测试中取得了40.3%的解决方案接受率，证明了其高效性和可靠性。该框架利用ClaudeSonnet3.5和GPT-4o等大型语言模型，为智能体提供代码编辑和代码导航功能。其中，GPT-4o擅长代码编辑，而Sonnet3.5则擅长代码组织和导航，共同协作，使得Aide框架在代码开发效率和代码质量方面都取得了显著的进步。

🚧 Aide框架的未来发展方向，包括优化智能体之间的协作，提高智能体的推理速度，降低智能体运行成本，以及实现与人类开发者的无缝协作。该框架最终目标是增强人类开发者的能力，而不是取代人类开发者。通过提供一群专门的智能体，减轻开发者的负担，让他们专注于更复杂的任务，从而提高软件开发的效率和质量。

🚀 Aide框架的出现，标志着软件开发领域进入了一个新的时代。多智能体协作的方式，将彻底改变软件开发的模式，为开发者带来更高效、更便捷的开发体验。相信在未来，随着人工智能技术的不断发展，Aide框架将得到更广泛的应用，并为软件开发领域带来更大的变革。

Recent developments in the field of software engineering have raised the bar for productivity and teamwork. A team of researchers from Codestory has recently developed a multi-agent coding framework called Aide that achieved a remarkable 40.3% accepted solutions on the SWE-Bench-Lite benchmark, establishing a new state-of-the-art. With its smooth integration into development environments and increased productivity, this framework promises to completely transform the way developers work with code.

https://aide.dev/blog/sota-on-swe-bench-lite

The idea of numerous agents, each in charge of a particular code symbol like a class, function, enum, or type, lies at the core of this architecture. This atomic level of granularity enables natural language communication amongst bots, enabling each to concentrate on a particular unit of task. The Language Server Protocol (LSP) facilitates the agents’ communication using protocols that guarantee accurate and effective information transmission.

Practically, this means that up to 30 agents can be active at once during a single run, collaborating to make decisions and sharing information. The framework’s capabilities have been demonstrated by its remarkable performance on the SWE-Bench-Lite benchmark. ClaudeSonnet3.5 and GPT-4o were utilized in the creation of an editor environment for the agents through the use of Pyright and Jedi. GPT-4o was exceptional at code editing, while Sonnet3.5—which is renowned for its robust agentic behaviors—was helpful in organizing and navigating the codebase.

The agentic aspect of Sonnet 3.5 was very significant. It was the first paradigm to propose separating functions instead of making already complex ones more complex, exhibiting a sophisticated knowledge of maintainability and code structure. This behavior, along with GPT-4o’s excellent code editing abilities, made the framework perform noticeably better than earlier versions.

The SWE-Bench-Lite benchmark was selected because it can replicate real-world coding difficulties, giving agents a reliable testing environment. The benchmark configuration comprised a mock editor harness with Pyright for diagnostics and Jinja for LSP features, enabling agents to obtain information and perform tests quickly without taxing system resources.

The benchmarking process yielded important lessons, one of which was the significance of agent collaboration. Together, agents who were each in charge of a different code symbol were able to do tasks quickly and often corrected unrelated problems like lint errors or TODOs as they went. This cooperative method not only enhanced the quality of the code but also demonstrated the ability of agentic systems to manage complicated coding jobs on their own.

The team has shared that there are still a few obstacles to overcome before fully including this multi-agent framework in development environments. Research is currently underway to ensure smooth communication between human developers and agents, handle concurrent code modifications, and preserve code stability. Furthermore, the team is studying to optimize the framework’s performance better, specifically with inference speeds and intelligence costs.

The team’s ultimate objective is to increase the capabilities of human developers rather than to replace them. The goal is to improve software development process accuracy and efficiency by supplying a swarm of specialized agents, freeing up developers to work on more complex problems while the agents take care of more detailed duties.

The post Transforming Software Development with Multi-Agent Collaboration: CodeStory’s Aide Framework Sets State-of-the-Art on SWE-Bench-Lite with 40.3% Accepted Solutions appeared first on MarkTechPost.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签