MarkTechPost@AI 2024年12月05日
ServiceNow Releases AgentLab: A New Open-Source Python Package for Developing and Evaluating Web Agents
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

ServiceNow推出了一个名为AgentLab的开源Python包,旨在简化Web代理的开发和评估。AgentLab基于BrowserGym构建,提供了一个环境,用于在各种Web基准测试(包括WebArena)上训练和测试代理。它集成了Ray,简化了大规模并行实验的运行,并提供了一个统一的LLM API,可与各种语言模型(如OpenAI、Azure和OpenRouter)无缝集成。AgentLab旨在提高Web代理开发的效率和可重复性,为个人研究人员和企业团队提供更便捷的开发工具。

🤔AgentLab是一个开源的Python包,旨在简化Web代理的开发和评估过程,解决现有框架的可扩展性、可重复性和集成性等问题。

🚀AgentLab基于BrowserGym构建,支持在包括WebArena在内的十个基准测试环境中训练和测试代理,这些基准测试环境模拟了真实世界的Web环境,涵盖了各种Web任务。

💡AgentLab集成了Ray库,可以轻松运行大规模并行实验,这对于测试多个代理配置或在不同环境中训练代理非常有用。

🌐AgentLab提供了一个统一的LLM API,支持OpenAI、Azure、OpenRouter等多种语言模型,并支持使用TGI的自托管模型,方便开发者灵活选择和切换语言模型。

📊AgentLab提供了一个统一的排行榜功能,可以跨多个任务比较不同代理的性能,并促进社区驱动的代理基准测试。

Developing web agents is a challenging area of AI research that has attracted significant attention in recent years. As the web becomes more dynamic and complex, it demands advanced capabilities from agents that interact autonomously with online platforms. One of the major challenges in building web agents is effectively testing, benchmarking, and evaluating their behavior in diverse and realistic online environments. Many existing frameworks for agent development have limitations such as poor scalability, difficulty in conducting reproducible experiments, and challenges in integrating with various language models and benchmark environments. Additionally, running large-scale, parallel experiments has often been cumbersome, especially for teams with limited computational resources or fragmented tools.

ServiceNow addresses these challenges by releasing AgentLab, an open-source package designed to simplify the development and evaluation of web agents. AgentLab offers a range of tools to streamline the process of creating web agents capable of navigating and interacting with various web platforms. Built on top of BrowserGym, another recent development from ServiceNow, AgentLab provides an environment for training and testing agents across a variety of web benchmarks, including the popular WebArena. With AgentLab, developers can run large-scale experiments in parallel, allowing them to evaluate and improve their agents’ performance across different tasks more efficiently. The package aims to make the agent development process more accessible for both individual researchers and enterprise teams.

Technical Details

AgentLab is designed to address common pain points in web agent development by offering a unified and flexible framework. One of its standout features is the integration with Ray, a library for parallel and distributed computing, which simplifies running large-scale parallel experiments. This feature is particularly useful for researchers who want to test multiple agent configurations or train agents across different environments simultaneously.

AgentLab also provides essential building blocks for creating agents using BrowserGym, which supports ten different benchmarks. These benchmarks serve as standardized environments to test agent capabilities, including WebArena, which evaluates agents’ performance on web-based tasks that require human-like interaction.

Another key advantage is the Unified LLM API offered by AgentLab. This API allows seamless integration with popular language models like OpenAI, Azure, and OpenRouter, and it also supports self-hosted models using Text Generation Inference (TGI). This flexibility enables developers to easily choose and switch between different large language models (LLMs) without additional configuration, thereby speeding up the agent development process. The unified leaderboard feature also adds value by providing a consistent way to compare agents’ performances across multiple tasks. Furthermore, AgentLab emphasizes reproducibility, offering built-in tools to help developers recreate experiments accurately, which is crucial for validating results and improving agent robustness.

Since its release, AgentLab has proven effective in helping developers scale up the process of creating and evaluating web agents. By leveraging Ray, users have been able to conduct large-scale parallel experiments that would have otherwise required extensive manual setup and substantial computational resources. BrowserGym, which serves as the foundation for AgentLab, has supported experimentation across ten benchmarks, including WebArena—a benchmark designed to test agent performance in dynamic web environments that mimic real-world websites.

Developers using AgentLab have reported improvements in both the efficiency and effectiveness of their experiments, especially when leveraging the Unified LLM API to switch between different language models seamlessly. These features not only accelerate development but also provide meaningful comparisons through a unified leaderboard, offering insights into the strengths and weaknesses of different web agent architectures.

Conclusion

ServiceNow’s AgentLab is a thoughtful open-source package for developing and evaluating web agents, addressing key challenges in this field. By integrating BrowserGym, Ray, and a Unified LLM API, AgentLab simplifies large-scale experimentation and benchmarking while ensuring consistency and reproducibility. The flexibility to switch between different language models and the ability to run extensive experiments in parallel make AgentLab a valuable tool for both individual developers and larger research teams.

Features like the unified leaderboard help standardize agent evaluation and foster a community-driven approach to agent benchmarking. As web automation and interaction become increasingly important, AgentLab offers a solid foundation for developing capable, efficient, and adaptable web agents.


Check out the GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 60k+ ML SubReddit.

[Must Attend Webinar]: ‘Transform proofs-of-concept into production-ready AI applications and agents’ (Promoted)

The post ServiceNow Releases AgentLab: A New Open-Source Python Package for Developing and Evaluating Web Agents appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Web代理 AgentLab BrowserGym 开源 人工智能
相关文章