MarkTechPost@AI 2024年10月26日
CMU Researchers Propose API-Based Web Agents: A Novel AI Approach to Web Agents by Enabling them to Use APIs in Addition to Traditional Web-Browsing Techniques
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

AI agents在网络环境中发挥重要作用,但传统浏览技术存在局限。卡内基梅隆大学的研究人员提出API-calling agent和Hybrid Agent,后者在WebArena基准测试中表现出色,该研究展示了AI驱动的网络导航的有希望的进步。

💻AI agents在网络环境中重要,但传统网络导航方法对机器效率有局限,尤其在面对复杂、图像密集的界面和不一致的网站接口时。

🚀卡内基梅隆大学研究者提出API-calling agent,通过APIs直接与数据交互,如JSON或XML格式,绕过人类式浏览动作。

🤝该团队还开发了Hybrid Agent,可根据任务需求在API调用和传统网络浏览间无缝切换,提高了速度、精度和适应性。

📈在WebArena基准测试中,Hybrid Agent表现优于传统浏览代理,在复杂任务中平均准确率达35.8%,成功率提高超20%。

AI agents have become essential tools for navigating web environments and performing online shopping, project management, and content browsing. Typically, these agents simulate human actions, such as clicks and scrolls, on websites primarily designed for visual, human interaction. Although practical, this method of web navigation poses limitations for machine efficiency, especially when tasks involve interacting with complex, image-heavy interfaces. The field of AI agent design thus faces a critical question: How can these agents perform web tasks with greater speed and accuracy, especially when website interfaces are inconsistent or suboptimal for machine use? This challenge has led researchers to explore alternatives to traditional browsing techniques.

AI agents operating purely through web navigation often encounter obstacles, like the need for multiple steps to retrieve information buried within a website’s structure. One of the primary challenges is that web-based tasks must be uniformly designed for machines. The problem is compounded by platforms lacking direct, machine-compatible access points. As a result, agents rely on complex action sequences to simulate browsing, creating inefficiencies that reduce accuracy and require substantial computational resources. The overarching problem is that existing web-browsing agents lack flexibility when working with data structured primarily for human interfaces, which affects task efficiency and limits the range of feasible online activities.

Existing AI navigation methods are primarily GUI-based, meaning they depend on accessibility trees to interpret and act on web elements like buttons and links. This approach, while functional, restricts agents to human-centric browsing sequences. Agents can access simplified versions of HTML DOM structures, but limitations arise when dealing with dynamically loaded content, image-heavy interfaces, or tasks involving extensive, repetitive actions. Browsing agents, designed for simpler and direct tasks, generally need help navigating web interfaces requiring numerous sequential steps to find specific data, often resulting in performance limitations.

Researchers from Carnegie Mellon University have introduced two innovative types of agents to enhance web task performance:

    API-calling agent: The API-calling agent completes tasks solely through APIs, interacting directly with data in formats like JSON or XML, which bypasses the need for human-like browsing actions. Hybrid Agent: Due to the limitations of API-only methods, the team also developed a Hybrid Agent, which can seamlessly alternate between API calls and traditional web browsing based on task requirements. This hybrid approach allows the agent to leverage APIs for efficient, direct data retrieval when available and switch to browsing when API support is limited or incomplete. By integrating both methods, this flexible model enhances speed, precision, and adaptability, allowing agents to navigate the web more effectively and tackle various tasks across diverse online environments.

The technology behind the hybrid agent is engineered to optimize data retrieval. By relying on API calls, agents can bypass traditional navigation sequences, retrieving structured data directly. This method also supports dynamic switching, where agents transition to GUI navigation when encountering unstructured or undocumented online content. This adaptability is particularly useful on websites with inconsistent API support, as the agent can revert to browsing to perform actions where APIs are absent. The dual-action capability improves agent versatility, enabling it to handle a wider array of web tasks by adapting its approach based on the available interaction formats.

In tests conducted on the WebArena benchmark, a simulation of real-world web tasks, the hybrid agent consistently outperformed traditional browsing agents, achieving an average accuracy of 35.8% and a success rate improvement of over 20% in complex tasks. On GitLab, for example, the agent achieved a completion rate of 44.4% compared to 12.8% for browsing-only agents. The hybrid model also proved notably efficient on tasks with high API availability, such as GitLab and Map services, completing tasks more quickly and with fewer navigation steps. This efficiency allowed the agent to outperform web-only methods, demonstrating the potential of a hybrid approach in achieving state-of-the-art results.

From these findings, several key insights emerge regarding the hybrid agent’s performance and versatility:

In conclusion, this research highlights a promising advancement in AI-driven web navigation by integrating browsing with API-based approaches. The hybrid model demonstrates that a combined strategy offers superior performance, adaptability, and efficiency over browsing-only agents. This balanced approach allows AI agents to access structured data rapidly while retaining flexibility in web environments that lack comprehensive API support, establishing a new benchmark for web navigation agents.


Check out the Paper, Project, and Code. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted)

The post CMU Researchers Propose API-Based Web Agents: A Novel AI Approach to Web Agents by Enabling them to Use APIs in Addition to Traditional Web-Browsing Techniques appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI agents 网络导航 API-calling agent Hybrid Agent
相关文章