MarkTechPost@AI 2024年10月22日
Google DeepMind Introduces Diffusion Model Predictive Control (D-MPC): Combining Multi-Step Action Proposals and Dynamics Models Using Diffusion Models for Online MPC
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Google DeepMind推出扩散模型预测控制(D-MPC),它整合了多步行动提议和动力学模型,利用扩散模型进行在线模型预测控制。D-MPC在D4RL基准上表现出色,能适应新的动力学和优化新奖励,其关键元素单独有效且组合时更强大,实验评估了其在多个方面的有效性。

🎯D-MPC旨在通过利用动力学模型和规划器选择行动,在规划范围内最大化目标函数。它具有灵活性,能在测试时适应新的奖励函数,不同于固定奖励的策略学习方法。

💡扩散模型从离线数据中学习世界动态和行动序列提议,以改进MPC。'样本、评分和排名'(SSR)方法简化了行动选择,是复杂优化技术的简单替代。

🚀Google DeepMind引入的D-MPC将多步行动提议和动力学模型相结合,用于在线MPC。在D4RL基准上,它优于现有的基于模型的离线规划方法,可与先进的强化学习方法竞争。

🔍该方法先从离线轨迹数据集中学习动力学模型、行动提议和启发式价值函数,规划时系统在采取行动和使用规划器生成下一个行动序列之间交替。SSR规划器采样多个行动序列,用学习的模型评估并选择最佳选项。

📈实验评估了D-MPC在多个方面的有效性,如超过离线MPC方法的性能改进、对新奖励和动力学的适应性以及提炼为快速反应策略等。

Model Predictive Control (MPC), or receding horizon control, aims to maximize an objective function over a planning horizon by leveraging a dynamics model and a planner to select actions. The flexibility of MPC allows it to adapt to novel reward functions at test time, unlike policy learning methods that focus on a fixed reward. Diffusion models learn world dynamics and action sequence proposals from offline data to improve MPC. A “sample, score, and rank” (SSR) method refines action selection, offering a simple alternative to more complex optimization techniques.

Model-based methods use dynamics models, with Dyna-style techniques learning policies online or offline, and MPC approaches utilizing models for runtime planning. Diffusion-based methods like Diffuser and Decision Diffuser apply joint trajectory models to predict state-action sequences. Some methods factorize the dynamics and action proposals for added flexibility. Multi-step diffusion modeling allows these approaches to generate trajectory-level predictions, improving their ability to adapt to new environments and rewards. Compared to more complex trajectory optimization approaches, these methods often simplify planning or policy generation.

Researchers from Google DeepMind introduced Diffusion Model Predictive Control (D-MPC), an approach that integrates multi-step action proposals and dynamics models using diffusion models for online MPC. On the D4RL benchmark, D-MPC outperforms existing model-based offline planning methods and competes with state-of-the-art reinforcement learning methods. D-MPC also adapts to novel dynamics and optimizes new rewards at runtime. The key elements, including multi-step dynamics, action proposals, and an SSR planner, are individually effective and even more powerful when combined.

The proposed method involves a multi-step diffusion-based extension of model-based offline planning. Initially, it learns the dynamics model, action proposals, and a heuristic value function from an offline dataset of trajectories. During planning, the system alternates between taking actions and generating the next sequence of actions using a planner. The SSR planner samples multiple action sequences evaluates them using the learned models, and selects the best option. This approach adapts easily to new reward functions and can be fine-tuned for changing dynamics using small amounts of new data.

The experiments evaluate D-MPC’s effectiveness in several areas: performance improvement over offline MPC methods, adaptability to new rewards and dynamics, and distillation into fast reactive policies. Tested on D4RL locomotion, Adroit, and Franka Kitchen tasks, D-MPC outperforms methods like MBOP and closely rivals others such as Diffuser and IQL. Notably, it generalizes well to rewards and adapts to hardware defects, improving performance after fine-tuning. Ablation studies show that using multi-step diffusion models for both action proposals and dynamics significantly enhances long-horizon prediction accuracy and overall task performance compared to single-step or transformer models.

In conclusion, the study introduced D-MPC, which enhances MPC by using diffusion models for multi-step action proposals and dynamics predictions. D-MPC reduces compounding errors and demonstrates strong performance on the D4RL benchmark, surpassing current model-based planning methods and competing with state-of-the-art reinforcement learning approaches. It excels at adapting to new rewards and dynamics during run time but requires replanning at each step, which is slower than reactive policies. Future work will focus on speeding up sampling and extending D-MPC to handle pixel observations using latent representation techniques.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit.

[Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted)

The post Google DeepMind Introduces Diffusion Model Predictive Control (D-MPC): Combining Multi-Step Action Proposals and Dynamics Models Using Diffusion Models for Online MPC appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Google DeepMind 扩散模型预测控制 模型预测控制 动力学模型 行动提议
相关文章