MarkTechPost@AI 2024年07月28日
Emergence AI Proposes Agent-E: A Web Agent Achieving 73.2% Success Rate with a 20% Improvement in Autonomous Web Navigation
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Emergence AI 推出了 Agent-E,一种新型的网页代理,旨在克服现有系统的不足,并提升网页自动浏览的成功率。Agent-E 采用分层架构,将任务规划和执行阶段分为两个独立的组件:规划代理和浏览器导航代理。这种分离使每个组件能够专注于其特定角色,从而提高效率和性能。Agent-E 在 WebVoyager 基准测试中表现出色,成功率高达 73.2%,比以前的文本网页代理提高了 20%,比多模态网页代理提高了 16%。

🤖 Agent-E 采用分层架构,将任务规划和执行阶段分为两个独立的组件:规划代理和浏览器导航代理。这种分离使每个组件能够专注于其特定角色,从而提高效率和性能。

🔍 Agent-E 利用灵活的 DOM 蒸馏技术来选择每个任务最相关的 DOM 表示,减少噪声并专注于特定于任务的信息。

📈 Agent-E 在 WebVoyager 基准测试中表现出色,成功率高达 73.2%,比以前的文本网页代理提高了 20%,比多模态网页代理提高了 16%。

⏱️ Agent-E 平均需要 150 秒才能成功完成一项任务,而失败的任务则需要 220 秒。它平均每个任务需要 25 个 LLM 调用,突出了其效率和有效性。

Autonomous web navigation focuses on developing AI agents capable of performing complex online tasks. These tasks range from data retrieval and form submissions to more intricate activities like finding the cheapest flights or booking accommodations. By leveraging large language models (LLMs) and other AI methodologies, autonomous web navigation aims to enhance productivity in both consumer and enterprise domains by automating tasks that are typically manual and time-consuming.

This research addresses the primary challenge of current web agents, which are inefficient and error-prone. Traditional web agents struggle with the noisy and expansive HTML Document Object Models (DOMs) and the dynamic nature of modern web pages. These agents often fail to perform tasks accurately due to their incompetence in handling the complexity & variability of web content effectively. This inefficiency is a significant barrier to the practical deployment of autonomous web agents in real-world applications, where reliability and precision are crucial.

Existing methods employed by web agents include encoding the DOM, using screenshots, and utilizing accessibility trees. Despite these techniques, current systems often fall short because they use a flat encoding of the DOM that does not capture the hierarchical structure of web pages. This leads to suboptimal performance, with agents failing to complete tasks or providing incorrect outputs. These limitations necessitate a more sophisticated approach to web navigation and task execution.

Researchers at Emergence AI introduced Agent-E, a novel web agent designed to overcome the shortcomings of existing systems. Agent-E’s hierarchical architecture divides the task planning and execution phases into two distinct components: the planner agent and the browser navigation agent. This separation allows each component to focus on its specific role, improving efficiency and performance. The planner agent decomposes tasks into sub-tasks, which are then executed by the browser navigation agent using advanced DOM distillation techniques.

The methodology of Agent-E involves several innovative steps to manage noisy and expansive web content effectively. The planner agent breaks down user tasks into smaller sub-tasks and assigns them to the browser navigation agent. This agent uses flexible DOM distillation techniques to select the most relevant DOM representation for each task, reducing noise and focusing on task-specific information. Agent-E employs change observation to monitor state changes during task execution, providing feedback that enhances the agent’s performance and accuracy.

Evaluations using the WebVoyager benchmark demonstrated that Agent-E significantly outperforms previous state-of-the-art web agents. Agent-E achieved a success rate of 73.2%, marking a 20% improvement over previous text-only web agents and a 16% increase over multi-modal web agents. On complex sites like Wolfram Alpha, Agent-E’s performance improvement reached up to 30%. Beyond success rates, the research team reported on additional metrics such as task completion times and error awareness. Agent-E averaged 150 seconds to complete a task successfully and 220 seconds for failed tasks. It required an average of 25 LLM calls per task, highlighting its efficiency and effectiveness.

In conclusion, the research conducted by Emergence AI represents a significant advancement in autonomous web navigation. By addressing the inefficiencies of current web agents through a hierarchical architecture and advanced DOM management techniques, Agent-E sets a new benchmark for performance and reliability. The study’s findings suggest that these innovations could be applied beyond web automation to other areas of AI-driven automation, offering valuable insights into the design principles of agentic systems. Agent-E’s success in achieving a 73.2% task completion rate and efficient task execution process underscores its potential for transforming web navigation and automation.


Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 47k+ ML SubReddit

Find Upcoming AI Webinars here

The post Emergence AI Proposes Agent-E: A Web Agent Achieving 73.2% Success Rate with a 20% Improvement in Autonomous Web Navigation appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Agent-E 网页代理 自动浏览 AI Emergence AI
相关文章