Meet CoAct-1: A Novel Multi-Agent System that Synergistically Combines GUI-based Control with Direct Programmatic Execution

A Team of researchers from USC, Salesforce AI and University of Washington have introduced CoAct-1, a pioneering multi-agent computer-using agent (CUA) that marks a significant leap in autonomous computer operation. By elevating coding to a first-class action—on par with traditional GUI manipulation—CoAct-1 overcomes longstanding challenges of efficiency and reliability in complex, long-horizon computer tasks. On the demanding OSWorld benchmark, CoAct-1 sets a new gold standard, achieving a state-of-the-art (SOTA) success rate of 60.76%, making it the first CUA agent to surpass the 60% mark.

Why CoAct-1? Bridging the Efficiency Gap in Computer-Using Agents

Conventional CUA agents rely solely on pixel-based GUI interaction—emulating human users by clicking, typing, and navigating interfaces. While this approach mimics user workflows, it proves fragile and inefficient for intricate, multi-step tasks, especially those involving dense UI layouts, multi-app pipelines, or complex OS operations. Single errors such as a mis-click can derail entire workflows, and sequence lengths balloon as tasks increase in complexity.

Efforts to mitigate these issues have included augmenting GUI agents with high-level planners, as seen in systems like GTA-1 and modular multi-agent frameworks. However, these methods cannot escape the bottleneck of GUI-centric action spaces, ultimately limiting both efficiency and robustness.

CoAct-1: Hybrid Architecture with Coding as Action

CoAct-1 takes a fundamentally different approach by integrating three specialized agents:

Orchestrator

Programmer

GUI Operator

This hybrid model enables CoAct-1 to strategically substitute brittle and lengthy mouse-keyboard operations with concise, reliable code execution, while still leveraging GUI interactions where necessary.

Evaluation on OSWorld: Record-Setting Performance

OSWorld—a leading benchmark featuring 369 tasks spanning office productivity, IDEs, browsers, file managers, and multi-app workflows—proves an exacting testbed for agentic systems. Each task mirrors real-world language goals and is assessed by a granular rule-based scoring system.

Results

Overall SOTA Success Rate

60.76%

the first CUA agent to cross the 60-point threshold

Stepped Allowance Performance

Efficiency

10.15 steps per successful task

Breakdown

CoAct-1 dominates across task types, with especially large gains in workflows benefitting from code execution:

Multi-App

OS Tasks

VLC

Key Insights: What Drives CoAct-1’s Gains?

Coding Actions Replace Redundant GUI Sequences

Dynamic Delegation

Improvement with Stronger Backbones

Efficiency Correlates with Reliability

Conclusion: A Leap Forward in Generalized Computer Automation

By making coding a first-class system action alongside GUI manipulation, CoAct-1 delivers both a quantum leap in success and efficiency, and illustrates the practical path forward for scalable, reliable autonomous computer agents. Its hybrid architecture and dynamic execution logic set a new high-water mark for the CUA field, heralding robust advances in real-world computer automation.

Check out the Paper and Technical details. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post Meet CoAct-1: A Novel Multi-Agent System that Synergistically Combines GUI-based Control with Direct Programmatic Execution appeared first on MarkTechPost.

Why CoAct-1? Bridging the Efficiency Gap in Computer-Using Agents

CoAct-1: Hybrid Architecture with Coding as Action

Evaluation on OSWorld: Record-Setting Performance

Results

Breakdown

Key Insights: What Drives CoAct-1’s Gains?

Conclusion: A Leap Forward in Generalized Computer Automation

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签