Model Predictive Adversarial Imitation Learning for Planning from Observation

cs.AI updates on arXiv.org 07月30日 12:11

Model Predictive Adversarial Imitation Learning for Planning from Observation

本文提出一种基于规划的代理来统一逆强化学习与模型预测控制，通过观察演示实现规划器的端到端交互学习，在可解释性、复杂度、安全性、样本效率、泛化能力和鲁棒性等方面取得显著改进。

arXiv:2507.21533v1 Announce Type: cross Abstract: Human demonstration data is often ambiguous and incomplete, motivating imitation learning approaches that also exhibit reliable planning behavior. A common paradigm to perform planning-from-demonstration involves learning a reward function via Inverse Reinforcement Learning (IRL) then deploying this reward via Model Predictive Control (MPC). Towards unifying these methods, we derive a replacement of the policy in IRL with a planning-based agent. With connections to Adversarial Imitation Learning, this formulation enables end-to-end interactive learning of planners from observation-only demonstrations. In addition to benefits in interpretability, complexity, and safety, we study and observe significant improvements on sample efficiency, out-of-distribution generalization, and robustness. The study includes evaluations in both simulated control benchmarks and real-world navigation experiments using few-to-single observation-only demonstrations.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

逆强化学习模型预测控制规划代理样本效率泛化能力

相关文章

Relational, Object-Centric Agents for Completing Simulated Household Tasks with Wilka Carvalho - #402

Optimizing for Choice: Novel Loss Functions Enhance AI Model Generalizability and Performance

Exploring Offline Reinforcement Learning RL: Offering Practical Advice for Domain-Specific Practitioners and Future Algorithm Development

WAIC 首日集锦：AI 春晚，大佬都说了啥？

Dropout: A Revolutionary Approach to Reducing Overfitting in Neural Networks

Scalable oversight as a quantitative rather than qualitative problem

通用机器人里程碑！MIT提出策略组合框架PoCo，解决数据源异构难题，实现机器人多任务灵活执行

Generalizable Reward Model (GRM): An Efficient AI Approach to Improve the Generalizability and Robustness of Reward Learning for LLMs

Can We Teach Transformers Causal Reasoning? This AI Paper Introduces Axiomatic Training: A Principle-Based Approach for Enhanced Causal Reasoning in AI Models

大模型“自学”后能力反下降，Llama/Mistral都没逃过