MarkTechPost@AI 8小时前
Meet CoAct-1: A Novel Multi-Agent System that Synergistically Combines GUI-based Control with Direct Programmatic Execution
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

CoAct-1是一种开创性的多智能体计算机使用代理(CUA),它将编码提升为与传统GUI操作同等重要的第一类动作,显著提高了复杂、长时程计算机任务的效率和可靠性。在OSWorld基准测试中,CoAct-1取得了60.76%的SOTA成功率,成为首个突破60%大关的CUA。该系统通过结合任务规划、脚本化后端操作和必要的GUI交互,有效规避了传统GUI代理的效率瓶颈和易出错性,在多应用、操作系统任务及文件管理等领域展现出卓越性能,为通用计算机自动化开辟了新路径。

💻 CoAct-1的核心创新在于将编码作为一种“第一类动作”与GUI操作并驾齐驱,通过Orchestrator(协调器)、Programmer(程序员)和GUI Operator(GUI操作员)三个专业智能体协同工作,能够根据任务需求智能地选择直接执行代码(如Python或Bash脚本)进行后端操作,或通过GUI进行人机交互,从而克服了传统CUA仅依赖GUI操作的效率低下和易出错问题。

🚀 在OSWorld这一包含369个任务的综合性基准测试中,CoAct-1取得了60.76%的SOTA成功率,尤其在100步以上任务类别中表现突出,是首个成功率超过60%的CUA,显著优于GTA-1(53.10%)和OpenAI CUA 4o(31.40%)等领先框架。同时,其平均每成功任务仅需10.15个步骤,远低于GTA-1的15.22步,显示出更高的效率。

📊 CoAct-1在多应用工作流(47.88% vs GTA-1的38.34%)和操作系统任务(75.00%)等场景下表现尤为出色,这些任务通常受益于代码执行以替代冗长的GUI操作序列。在生产力工具(如LibreOffice Calc、Writer)和IDE(如VSCode)领域,CoAct-1也保持领先或与SOTA持平,证明了其在多种计算机任务上的通用性。

💡 CoAct-1的性能提升得益于几个关键因素:编码动作替代了大量冗余且易错的GUI操作;Orchestrator的动态任务分配实现了代码与GUI操作的策略性互补;以及使用更强大的基础模型(如OpenAI CUA 4o、o3、o4-mini)能够进一步提升整体表现,表明更强大的模型能力是实现高成功率的关键。

📈 效率与可靠性紧密相关,更少的执行步骤直接降低了出错的可能性,这是任务成功的关键预测因子。CoAct-1通过优化执行路径,不仅提高了效率,也大幅提升了自动化任务的稳定性和可靠性,为实现可扩展、可靠的自主计算机代理指明了方向。

A Team of researchers from USC, Salesforce AI and University of Washington have introduced CoAct-1, a pioneering multi-agent computer-using agent (CUA) that marks a significant leap in autonomous computer operation. By elevating coding to a first-class action—on par with traditional GUI manipulation—CoAct-1 overcomes longstanding challenges of efficiency and reliability in complex, long-horizon computer tasks. On the demanding OSWorld benchmark, CoAct-1 sets a new gold standard, achieving a state-of-the-art (SOTA) success rate of 60.76%, making it the first CUA agent to surpass the 60% mark.

Why CoAct-1? Bridging the Efficiency Gap in Computer-Using Agents

Conventional CUA agents rely solely on pixel-based GUI interaction—emulating human users by clicking, typing, and navigating interfaces. While this approach mimics user workflows, it proves fragile and inefficient for intricate, multi-step tasks, especially those involving dense UI layouts, multi-app pipelines, or complex OS operations. Single errors such as a mis-click can derail entire workflows, and sequence lengths balloon as tasks increase in complexity.

Efforts to mitigate these issues have included augmenting GUI agents with high-level planners, as seen in systems like GTA-1 and modular multi-agent frameworks. However, these methods cannot escape the bottleneck of GUI-centric action spaces, ultimately limiting both efficiency and robustness.

CoAct-1: Hybrid Architecture with Coding as Action

CoAct-1 takes a fundamentally different approach by integrating three specialized agents:

This hybrid model enables CoAct-1 to strategically substitute brittle and lengthy mouse-keyboard operations with concise, reliable code execution, while still leveraging GUI interactions where necessary.

Evaluation on OSWorld: Record-Setting Performance

OSWorld—a leading benchmark featuring 369 tasks spanning office productivity, IDEs, browsers, file managers, and multi-app workflows—proves an exacting testbed for agentic systems. Each task mirrors real-world language goals and is assessed by a granular rule-based scoring system.

Results

Breakdown

CoAct-1 dominates across task types, with especially large gains in workflows benefitting from code execution:

Key Insights: What Drives CoAct-1’s Gains?

Conclusion: A Leap Forward in Generalized Computer Automation

By making coding a first-class system action alongside GUI manipulation, CoAct-1 delivers both a quantum leap in success and efficiency, and illustrates the practical path forward for scalable, reliable autonomous computer agents. Its hybrid architecture and dynamic execution logic set a new high-water mark for the CUA field, heralding robust advances in real-world computer automation.


Check out the Paper and Technical details. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post Meet CoAct-1: A Novel Multi-Agent System that Synergistically Combines GUI-based Control with Direct Programmatic Execution appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

CoAct-1 多智能体系统 计算机使用代理 AI自动化 编码执行
相关文章