MarkTechPost@AI 2024年09月14日
Automating Reinforcement Learning Workflows with Vision-Language Models: Towards Autonomous Mastery of Robotic Tasks
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

介绍利用大型视觉语言模型和语言模型推动强化学习及机器人技术发展,提出创新代理架构以实现自动化的相关内容

🎯大型视觉语言模型和语言模型对强化学习和机器人技术产生重要影响,减少了对领域特定知识的需求,但训练政策的实验工作流程中仍需人工干预

💡为解决现有问题,开发LLM赋能的代理以协助软件工程和科学研究任务,然而现有工作多集中在自动化个别步骤或特定领域,需开发能整合整个实验流程的集成系统

🚀DeepMind研究者提出创新代理架构,利用VLM自动化RL实验工作流程的关键方面,包括监测分析实验进展、提出新任务、分解任务为子任务、检索执行所需技能等

🛠️研究者实施该系统的组件并应用于模拟机器人操作任务,系统架构包含课程模块、体现模块、分析模块,各模块通过聊天界面交互,采用PAC模型进行策略训练

Recent advancements in utilizing large vision language models (VLMs) and language models (LLMs) have significantly impacted reinforcement learning (RL) and robotics. These models have demonstrated their utility in learning robot policies, high-level reasoning, and automating the generation of reward functions for policy learning. This progress has notably reduced the need for domain-specific knowledge typically required from RL researchers.

However, despite these advancements, many steps within the experimental workflow of training policies via RL still necessitate human intervention. These steps include determining when an experiment has concluded and constructing task curricula to facilitate learning target tasks. While some research has attempted to automate individual steps in this process, such as automated training and evaluation of standard machine learning tasks or automated curriculum building, these approaches often consider each step in isolation, utilizing models specifically trained for a single task. The challenge remains to develop a more holistic, automated approach that can seamlessly integrate these various steps in the RL experimental workflow, reducing the need for human intervention across the entire process.

In the realm of science and engineering automation, LLM-empowered agents are being developed to assist in software engineering tasks, from interactive pair-programming to end-to-end software development. Similarly, in scientific research, LLM-based agents are being employed to generate research directions, analyze literature, automate scientific discovery, and conduct machine learning experiments. For embodied agents, particularly in robotics, LLMs are being utilized to write policy code, decompose high-level tasks into subtasks, and even propose tasks for open-ended exploration. Notable examples include the Voyager agent for Minecraft and systems like CaP and SayCan for robotics tasks. These approaches demonstrate the potential of LLMs in automating complex reasoning and decision-making processes in physical environments. However, most existing work focuses on automating individual steps or specific domains. The challenge remains in developing integrated systems that can automate entire experimental workflows, particularly in reinforcement learning for robotics, where task proposal, decomposition, execution, and evaluation need to be seamlessly combined.

DeepMind Researchers propose an innovative agent architecture that automates key aspects of the RL experiment workflow, aiming to enable automated mastery of control domains for embodied agents. This system utilizes a VLM to perform tasks typically handled by human experimenters, including:

1. Monitoring and analyzing experiment progress

2. Proposing new tasks based on the agent’s past successes and failures

3. Decomposing tasks into sequences of subtasks (skills)

4. Retrieving appropriate skills for execution

This approach enables the system to build automated curricula for learning, representing one of the first proposals for a system that utilizes a VLM throughout the entire RL experiment cycle.

The researchers have developed a prototype of this system, using a standard Gemini model without additional fine-tuning. This model provides a curriculum of skills to a language-conditioned Actor-Critic algorithm, guiding data collection to aid in learning new skills. The data collected through this method is effective for learning and iteratively improving control policies in a robotics domain. Further examination of the system’s ability to build a growing library of skills and assess the progress of skill training has yielded promising results. This suggests that the proposed architecture offers a potential blueprint for fully automated mastery of tasks and domains for embodied agents, marking a significant step towards more autonomous and efficient reinforcement learning systems in robotics.

To explore the feasibility of their proposed system, the researchers implemented its components and applied them to a simulated robotic manipulation task. The system architecture consists of several interacting modules:

1. Curriculum Module: This module retrieves images from the environment and incorporates them into goal proposal prompts. It decomposes goals into steps and retrieves skill captions. If all steps can be mapped to known skills, the skill sequence is sent to the embodiment module. 

2. Embodiment Module: This uses a text-conditioned learned policy (Perceiver-Actor-Critic algorithm) to execute the skill sequences. Multiple instances of this module can perform episode rollouts simultaneously.

3. Analysis Module: Used outside the experiment loop to evaluate convergence points.

The modules interact through a chat-based interface in a Google Meet session, allowing for easy connection and human introspection. The curriculum module controls the program flow, changing skills at fixed intervals during rollouts.

For policy training, the system employs a Perceiver-Actor-Critic (PAC) model, which can be trained via offline reinforcement learning and is text-conditioned. This allows for the use of non-expert exploration data and relabeling of data with multiple reward functions. The high-level system utilizes a standard Gemini 1.5 Pro model, with prompts designed using the OneTwo Python library. The prompts include a small number of hand-designed exemplars with image data from previous experiments, covering proposal, decomposition, retrieval, and analysis tasks. This implementation demonstrates a practical approach to integrating VLMs into the RL workflow, enabling automated task proposal, decomposition, and execution in a simulated robotic environment.

The researchers evaluated their approach using a robotic block stacking task involving a 7-DoF Franka Panda robot in a MuJoCo simulator. They first trained a PAC model with 140M parameters on basic skills using a pre-existing dataset of 1M episodes. The Gemini-driven data collection process then generated 25k new episodes, exploring different VLM sampling temperatures and skill sets. The analysis module was used to determine optimal early stopping points and assess skill convergence. The curriculum module’s ability to work with a growing skill set was examined at various points in the experiment, demonstrating the system’s capacity for progressive learning and task decomposition.

The researchers have proposed an innovative agent architecture for reinforcement learning that utilizes a VLM to automate tasks typically performed by human experimenters. This system aims to enable embodied agents to autonomously acquire and master an expanding set of skills. The prototype implementation demonstrated several key capabilities:

Despite some simplifications in the prototype, the system successfully collected diverse data for self-improvement of the control policy and learned new skills beyond its initial set. The curriculum showed adaptability in proposing tasks based on available skill complexity.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group.

If you like our work, you will love our Newsletter..

Don’t Forget to join our 50k+ ML SubReddit

FREE AI WEBINAR: ‘SAM 2 for Video: How to Fine-tune On Your Data’ (Wed, Sep 25, 4:00 AM – 4:45 AM EST)

The post Automating Reinforcement Learning Workflows with Vision-Language Models: Towards Autonomous Mastery of Robotic Tasks appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

强化学习 视觉语言模型 机器人技术 自动化流程
相关文章