MarkTechPost@AI 2024年11月02日
All Hands AI Open Sources OpenHands CodeAct 2.1: A New Software Development Agent to Solve Over 50% of Real Github Issues in SWE-Bench
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

OpenHands CodeAct 2.1是新的软件开发代理,能解决SWE-Bench中超过50%的真实GitHub问题。它克服了AI代理的常见问题,在实际项目中有显著影响,且是开源的,性能提升源于多项更新。

🥇OpenHands CodeAct 2.1是首个在SWE-Bench中解决超过50%真实GitHub问题的软件开发代理,在SWE-Bench Lite中也有41.7%的成功率,代表了重大进步。

💻它采用了Anthropic的新Claude - 3.5模型,显著提升了自然语言理解能力,能更好地解释开发者提出的问题。

📝代理的行动使用了函数调用,使任务执行更精确,能准确调用特定代码片段,更有效地解决开发者问题。

🚶‍♂️开发者对CodeAct 2.1的目录遍历进行了显著改进,减少了代理陷入重复或循环任务的情况,提高了解决复杂问题的效率。

The world of software development has seen an explosion in the use of AI agents over the last few years, promising to enhance productivity, automate complex tasks, and make the lives of developers easier. However, one problem that remains prevalent is the significant gap between these promising AI agents and their ability to address real-world issues effectively. Most AI Agents struggle to understand the complexity and contextual nuances of software development challenges—especially when it comes to solving real GitHub issues that developers face every day. These AI agents often fall short, requiring extensive oversight or manual correction from developers, which defeats their purpose. Addressing this challenge requires a solution that is not just smarter but is able to keep up with the dynamic demands of software engineering, a space full of unique challenges and fast-moving projects.

All Hands AI Open Sources OpenHands CodeAct 2.1: a new software development agent, the first to solve over 50% of real GitHub issues in SWE-Bench, the standard benchmark for evaluating AI-assisted software engineering tools. OpenHands CodeAct 2.1 represents a significant leap forward, boasting a 53% resolution rate on SWE-Bench and a 41.7% success rate on SWE-Bench Lite. What makes OpenHands CodeAct 2.1 particularly revolutionary is that it has gone beyond experimentation in controlled environments and is now making a substantial impact on actual projects by solving real GitHub issues autonomously. Unlike other tools that are either too closed off for contribution or too niche to be useful to the broader community, OpenHands is an open-source agent that developers can freely use, improve, and adapt. With the perfect combination of openness and competitiveness, it has become the top choice for developers seeking an effective AI solution.

OpenHands CodeAct 2.1’s performance improvements are primarily rooted in three major updates. First, it switched to Anthropic’s new Claude-3.5 model, which significantly improves natural language understanding, allowing CodeAct to better interpret issues raised by developers. Second, the agent’s actions have been modified to use function calling, which brings more precision in task execution. This ensures that the agent can call specific pieces of code without misinterpretation, effectively addressing developer issues more accurately. Lastly, the developers behind CodeAct 2.1 made significant improvements regarding directory traversal, reducing instances of the agent getting stuck in repetitive or circular tasks—a common problem that plagued earlier iterations. By refining the agent’s capabilities to navigate directories intelligently, larger and more complicated issues are resolved smoothly, and efficiency is markedly increased.

The importance of these updates cannot be overstated. Having a 53% resolve rate on SWE-Bench means that over half of the issues in this benchmark were solved without any human intervention. Considering that SWE-Bench is specifically designed to be representative of real-world GitHub issues faced by software developers, this milestone demonstrates that OpenHands CodeAct 2.1 can directly impact software engineering workflows by solving a substantial number of issues autonomously. In the broader scope of automated development assistance, this is significant because it saves developers time and allows them to focus on higher-level challenges rather than getting bogged down by tedious issue resolution. Moreover, the open-source nature of OpenHands invites developers from around the globe to contribute and further improve the agent—a feature that the development community holds in high regard. The data from SWE-Bench Lite, where OpenHands CodeAct 2.1 achieved a 41.7% resolve rate, also supports its versatility and capability in handling less complex issues, which can be equally disruptive when left unchecked in a development pipeline.

In conclusion, OpenHands CodeAct 2.1 is a breakthrough in AI-driven software development, moving us a step closer to fully autonomous coding assistants that genuinely enhance productivity. Its ability to solve over 50% of real GitHub issues in SWE-Bench demonstrates not only technological advancement but also practical usability that developers can rely on day-to-day. The open-source nature of OpenHands ensures that it remains a community-driven effort with the promise of continued improvements. Whether developers are looking to run OpenHands locally, integrate it through GitHub actions, or sign up for the soon-to-be-released online version, it offers flexibility and an open invitation to all developers to join in its evolution. With major improvements in the agent’s capabilities—such as adopting Anthropic’s Claude-3.5, implementing function calling, and improving directory traversal—OpenHands CodeAct 2.1 is setting the standard for what an AI development agent should be: effective, accessible, and continuously evolving.


Check out the Details and GitHub here. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[Trending] LLMWare Introduces Model Depot: An Extensive Collection of Small Language Models (SLMs) for Intel PCs

The post All Hands AI Open Sources OpenHands CodeAct 2.1: A New Software Development Agent to Solve Over 50% of Real Github Issues in SWE-Bench appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

OpenHands CodeAct 2.1 软件开发 AI 代理 SWE-Bench
相关文章