MarkTechPost@AI 2024年08月16日
Agent Q: A New AI Framework for Autonomous Improvement of Web-Agents with Limited Human Supervision- with a 340% Improvement over LLama 3’s Baseline Zero-Shot Performance
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Agent Q是一种新型AI框架,旨在解决LLMs在动态任务中的挑战,它结合多种技术,提升了模型在实际应用中的表现。

🎯Agent Q是为应对LLMs在多步推理和动态交互环境中面临的挑战而开发的。它以LLaMa 3为基础,融合了先进搜索技术、自我批判和强化学习,改变了LLMs与网络的交互方式。它通过结合引导式蒙特卡洛树搜索和非策略直接偏好优化算法,使模型能够从成功和不成功的轨迹中学习,显著提高了其在复杂多步推理任务中的泛化能力。

🌟Agent Q的创新架构包含几个关键组件,以增强其在交互环境中的性能。引导式蒙特卡洛树搜索自主探索不同行动和网页,有效平衡了探索与利用。自我批判机制在每个决策步骤提供实时反馈,有助于改进推理过程,这在长期任务中尤为重要,因为稀疏奖励可能会阻碍学习。直接偏好优化算法通过从蒙特卡洛树搜索生成的数据中构建偏好对来微调模型,使代理能够从成功和次优行动中有效学习。

🎉Agent Q在实际场景中的应用成果非凡。在OpenTable的一系列预订实验中,仅一天的自主数据收集后,Agent Q将LLaMa 3的基线零样本性能从18.6%提高到了惊人的81.7%,进一步的在线搜索后,成功率攀升至95.4%,实现了340%的提升,彰显了其自主改进和适应的能力。

Large Language Models (LLMs) have achieved remarkable progress in the ever-expanding realm of artificial intelligence, revolutionizing natural language processing and interaction. Yet, even the most sophisticated LLMs, like LLaMa 3, face substantial challenges in tasks requiring multi-step reasoning and decision-making in dynamic, interactive environments. Traditional training methodologies, heavily reliant on static datasets, must prepare these models for real-world applications, particularly in web navigation, where adaptability and complex reasoning are paramount. MultiOn researchers introduced Agent Q, a groundbreaking autonomous web agent that has been developed to address these challenges. Built upon the foundation of LLaMa 3, Agent Q combines advanced search techniques, self-critique, and reinforcement learning, transforming how LLMs navigate and interact with the web. By pushing the boundaries of autonomous agents, Agent Q sets a new standard for real-world AI applications. 

Traditional approaches to training LLMs for dynamic tasks typically involve supervised fine-tuning on curated datasets. While effective in controlled scenarios, these methods often must improve in complex environments that demand multi-step reasoning and adaptive learning. The main issue lies in their tendency to produce suboptimal results due to compounding errors and limited exploration. 

Agent Q is a cutting-edge framework designed to overcome these challenges by integrating advanced search techniques, self-critique mechanisms, and reinforcement learning. Unlike conventional methods that rely heavily on supervised fine-tuning, Agent Q employs a combination of guided Monte Carlo Tree Search (MCTS) and an off-policy variant of the Direct Preference Optimization (DPO) algorithm. This approach allows LLM agents to learn from successful and unsuccessful trajectories, significantly improving their generalization capabilities in complex, multi-step reasoning tasks. By leveraging these methodologies, Agent Q addresses the shortcomings of existing models and sets a new benchmark for autonomous web agents.

The innovative architecture of Agent Q consists of several key components that enhance its performance in interactive environments. Guided MCTS plays a crucial role by autonomously exploring different actions and web pages, effectively balancing exploration and exploitation. This technique generates diverse and optimal trajectories essential for training robust agents. Additionally, the self-critique mechanism provides real-time feedback at each decision-making step, allowing the agent to refine its reasoning process. This feedback loop is particularly important for long-horizon tasks, where sparse rewards can hinder learning. Furthermore, the DPO algorithm fine-tunes the model by constructing preference pairs from the data generated during MCTS, enabling the agent to learn effectively from both successful and sub-optimal actions.

The results of Agent Q’s application in real-world scenarios are nothing short of extraordinary. In a series of booking experiments on OpenTable, Agent Q improved the baseline zero-shot performance of LLaMa 3 from 18.6% to an astounding 81.7% after just one day of autonomous data collection. With further online search, this success rate climbed to 95.4%, representing a 340% improvement. These impressive results highlight Agent Q’s ability to autonomously improve and adapt, setting a new standard for autonomous web agents.

In conclusion, Agent Q represents a monumental leap forward in developing autonomous web agents. By addressing the limitations of traditional LLM training methodologies, Agent Q introduces a novel framework that combines advanced search techniques, AI self-critique, and reinforcement learning. This approach enhances the agent’s decision-making capabilities and allows it to improve continuously in real-world, dynamic environments. With its impressive performance and potential for further development, Agent Q sets a new benchmark for what is possible in autonomous web navigation, paving the way for more intelligent and adaptable AI agents.


Check out the Paper and Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here


The post Agent Q: A New AI Framework for Autonomous Improvement of Web-Agents with Limited Human Supervision- with a 340% Improvement over LLama 3’s Baseline Zero-Shot Performance appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Agent Q 人工智能 强化学习 自主改进
相关文章