MarkTechPost@AI 2024年07月28日
OpenDevin: An Artificial Intelligence Platform for the Development of Powerful AI Agents that Interact in Similar Ways to Those of a Human Developer
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

OpenDevin是一个新颖的平台,旨在帮助开发能够像人类开发人员一样灵活地执行各种任务的强大AI代理。它提供了一个安全的环境,使代理能够编写和执行代码、与命令行交互以及浏览网页,从而能够处理复杂的多步骤任务。OpenDevin还支持多代理协作,使代理能够将任务委托给专门的代理,从而提高性能。

👨‍💻 **OpenDevin:** OpenDevin旨在帮助开发能够像人类开发人员一样灵活地执行各种任务的强大AI代理。它提供了一个安全的环境,使代理能够编写和执行代码、与命令行交互以及浏览网页,从而能够处理复杂的多步骤任务。 OpenDevin还支持多代理协作,使代理能够将任务委托给专门的代理,从而提高性能。 OpenDevin的开发团队来自多个机构,包括UIUC、CMU、Yale、UC Berkeley、Contextual AI、KAUST、ANU、HCMUT、Alibaba和All Hands AI。 OpenDevin是开源的,并集成了评估基准,为AI代理的开发和评估提供了一个灵活且可扩展的平台。

🤖 **技术实现:** OpenDevin包含一个沙盒操作系统和一个Web浏览器,使代理能够安全有效地执行任务。代理可以通过一组核心通用操作与环境交互,例如执行Python代码、运行bash命令以及使用BrowserGym的领域特定语言浏览网页。平台的代理运行时通过SSH协议将代理连接到这些环境,确保安全隔离的任务执行。 OpenDevin还包含一个AgentSkills库,该库提供了一组实用程序函数,代理可以使用这些函数来执行复杂的任务。该库的设计便于扩展,允许社区成员贡献新的工具和技能。

📊 **评估结果:** OpenDevin在15个基准上进行了评估,包括软件工程任务(例如SWE-Bench和HumanEvalFix)、Web浏览任务(例如WebArena和MiniWoB++)以及其他辅助任务(例如GAIA和GPQA)。OpenDevin的代理在这些基准上表现出了竞争力。 例如,在SWE-Bench Lite中,CodeActAgent的解决率为26%,与其他专门的代理相当。在HumanEvalFix中,OpenDevin代理修复了79.3%的Python错误,明显优于非代理方法。该平台在Web浏览任务中也取得了良好的结果,其BrowsingAgent在WebArena中的成功率为15.5%。这些结果表明OpenDevin在处理各种任务方面的有效性,以及其作为通用AI平台的潜力。

🏆 **结论:** OpenDevin在AI代理的开发和部署方面取得了重大进展。该方法解决了创建一个灵活且强大的AI代理的关键挑战,这些代理能够自主地执行复杂的任务。通过集成一组全面的工具、环境和评估框架,OpenDevin克服了现有方法的局限性,为未来的AI研究和应用提供了一个强大的平台。该平台的开源性质和社区驱动的开发进一步增强了其对AI领域的影响力。

🚀 **OpenDevin的意义:** OpenDevin为AI代理的开发提供了一种新的方法,它使代理能够像人类开发人员一样灵活地执行各种任务,从而推动了AI研究和应用的进步。OpenDevin的开源性质和社区驱动的开发,使其成为AI研究和应用的宝贵资源。

Developing AI agents that can autonomously perform a wide variety of tasks with the same flexibility and capability as human software developers presents a significant challenge. These tasks include writing and executing code, interacting with command lines, and browsing the web. Current AI agents often lack the necessary adaptability and generalization for such diverse and complex operations. Addressing this challenge is crucial for advancing AI research and enhancing its applicability in real-world scenarios, such as software development, web navigation, and problem-solving across various domains.

Existing methods for developing AI agents include frameworks like AutoGPT, LangChains, and MetaGPT. These frameworks provide essential tools for agent development, such as interfaces for interaction, environments for operation, and mechanisms for communication. However, these methods have specific limitations. For instance, AutoGPT and LangChains do not natively support sandboxed code execution or built-in web browsers, which limits their applicability in tasks requiring safe code execution and web interactions. MetaGPT, while supporting multi-agent collaboration, lacks a standardized tool library, which hinders the development of diverse agent skills. Overall, these limitations restrict the performance and applicability of current AI agents, particularly in complex, multi-step tasks that require generalization across different domains.

A team of researchers from UIUC, CMU, Yale, UC Berkeley, Contextual AI, KAUST, ANU, HCMUT, Alibaba, and All Hands AI propose OpenDevin. OpenDevin offers a novel approach by creating a comprehensive platform that supports the development of generalist and specialist AI agents. The platform addresses the limitations of existing methods by incorporating a powerful interaction mechanism, a sandboxed environment for safe code execution, and a built-in web browser for web-based tasks. Key components of OpenDevin include a state and event stream architecture, an agent runtime environment, and a multi-agent delegation framework. This innovative approach allows AI agents to perform a wide range of tasks by writing and executing code, interacting with command lines, and browsing the web. OpenDevin’s open-source nature and its integration with evaluation benchmarks further enhance its contribution to the field by providing a versatile and scalable platform for AI agent development and assessment.

The technical implementation of OpenDevin involves several critical components. The platform features a sandboxed operating system and a web browser, enabling agents to perform tasks safely and efficiently. Agents can interact with the environment through a core set of general actions, such as executing Python code, running bash commands, and navigating web pages using BrowserGym’s domain-specific language. The platform’s agent runtime connects agents to these environments via SSH protocol, ensuring secure and isolated task execution. OpenDevin also includes an AgentSkills library, which provides a set of utility functions that agents can use to perform complex tasks. This library is designed for easy extension, allowing community members to contribute new tools and skills. Furthermore, the platform supports multi-agent collaboration, enabling agents to delegate tasks to specialized agents for improved performance.

OpenDevin was evaluated across 15 benchmarks, including software engineering tasks like SWE-Bench and HumanEvalFix, web browsing tasks such as WebArena and MiniWoB++, and miscellaneous assistance tasks including GAIA and GPQA. OpenDevin’s agents demonstrated competitive performance across these benchmarks. In SWE-Bench Lite, the CodeActAgent achieved a resolve rate of 26%, comparable to other specialized agents. In HumanEvalFix, OpenDevin agents fixed 79.3% of Python bugs, significantly outperforming non-agentic approaches. The platform also showed strong results in web browsing tasks, with its BrowsingAgent achieving a 15.5% success rate in WebArena. These results highlight OpenDevin’s effectiveness in handling diverse tasks and its potential as a generalist AI platform.

In conclusion, OpenDevin presents a significant advancement in the development and deployment of AI agents. This proposed method addresses the critical challenge of creating flexible and powerful AI agents capable of performing complex tasks autonomously. By integrating a comprehensive set of tools, environments, and evaluation frameworks, OpenDevin overcomes the limitations of existing methods and provides a robust platform for future AI research and applications. The platform’s open-source nature and community-driven development further enhance its potential impact on the field of AI.


Check out the Paper, Code, and Benchmark. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 47k+ ML SubReddit

Find Upcoming AI Webinars here

The post OpenDevin: An Artificial Intelligence Platform for the Development of Powerful AI Agents that Interact in Similar Ways to Those of a Human Developer appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

OpenDevin AI代理 软件开发 Web浏览 多代理协作
相关文章