MarkTechPost@AI 2024年12月17日
Meta FAIR Releases Meta Motivo: A New Behavioral Foundation Model for Controlling Virtual Physics-based Humanoid Agents for a Wide Range of Complex Whole-Body Tasks
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Meta FAIR发布了Meta Motivo,这是一个用于控制虚拟物理人形代理的新型行为基础模型。该模型通过FB-CPR算法,利用无标签行为数据进行训练,实现了在多种复杂全身任务中的零样本学习能力。该模型能够执行运动跟踪、目标达成和奖励最大化等任务,并展现出令人印象深刻的行为表达能力。尽管存在一些局限性,例如难以处理与运动捕捉数据集差异较大的任务,但Meta Motivo在人形机器人控制领域迈出了重要一步,其性能和“类人”行为表现均值得关注。

🤖FB-CPR算法:该算法通过前向-后向表示将无标签轨迹嵌入共享潜在空间,并使用潜在条件判别器鼓励策略“覆盖”数据集状态,从而实现无监督强化学习。

🤸META MOTIVO模型:基于FB-CPR算法,Meta开发了META MOTIVO,一个用于全身人形控制的行为基础模型,它可以在零样本学习场景下解决各种任务,如运动跟踪、目标到达和奖励优化。

🎯性能评估:FB-CPR在多个任务类别中表现出色,在零样本能力方面达到了顶线算法性能的73.4%。在奖励最大化任务中,它超越了无监督基线,并在计算复杂度较低的情况下实现了DIFFUSER性能的177%。

🚶人类感知:人类评估研究表明,尽管针对特定任务的算法可能实现更高的数值性能,但FB-CPR在奖励和目标任务中均被认为更“类人”,其行为更自然。

🚧模型局限与未来方向:FB-CPR在处理与运动捕捉数据集差异较大的任务时存在困难,且目前仅限于本体感受观察。未来研究将探索整合更多状态变量、复杂感知方法、视频人类活动数据集以及更直接的语言-策略对齐技术。

Foundation models, pre-trained on extensive unlabeled data, have emerged as a cutting-edge approach for developing versatile AI systems capable of solving complex tasks through targeted prompts. Researchers are now exploring the potential of extending this paradigm beyond language and visual domains, focusing on behavioral foundation models (BFMs) for agents interacting with dynamic environments. Specifically, the research aims to develop BFMs for humanoid agents, targeting whole-body control through proprioceptive observations. This approach addresses a long-standing challenge in robotics and AI, characterized by the high-dimensionality and intrinsic instability of humanoid control systems. The ultimate goal is to create generalized models that can express diverse behaviors in response to various prompts, including imitation, goal achievement, and reward optimization.

Meta researchers introduce FB-CPR (Forward-Backward representations with Conditional Policy Regularization), an innovative online unsupervised reinforcement learning algorithm designed to ground policy learning through observation-only unlabeled behaviors. The algorithm’s key technical innovation involves utilizing forward-backward representations to embed unlabeled trajectories into a shared latent space, utilizing a latent-conditional discriminator to encourage policies to comprehensively “cover” dataset states. Demonstrating the method’s effectiveness, the team developed META MOTIVO, a behavioral foundation model for whole-body humanoid control that can be prompted to solve diverse tasks such as motion tracking, goal reaching, and reward optimization in a zero-shot learning scenario. The model utilizes the SMPL skeleton and AMASS motion capture dataset to achieve remarkable behavioral expressiveness.

Researchers introduce a robust approach to forward-backward (FB) representation learning with conditional policy regularization. At the pre-training stage, the agent has access to an unlabeled behavior dataset containing observation-only trajectories. The method focuses on developing a continuous set of latent-conditioned policies where latent variables are drawn from a distribution defined over a latent space. By representing behaviors through the joint space of states and latent variables, the researchers aim to capture diverse motion patterns. The key innovation lies in inferring latent variables for each trajectory using the ERFB method, which allows encoding trajectories into a shared representational space. The ultimate goal is to regularize the unsupervised training of the behavioral foundation model by minimizing the discrepancy between the induced policy distribution and the dataset distribution.

The research presents a comprehensive performance evaluation of the FB-CPR algorithm across multiple task categories. FB-CPR demonstrates remarkable zero-shot capabilities, achieving 73.4% of top-line algorithm performance without explicit task-specific training. In reward-maximization tasks, the method outperforms unsupervised baselines, notably achieving 177% of DIFFUSER’s performance while maintaining significantly lower computational complexity. For goal-reaching tasks, FB-CPR performs comparably to specialized baselines, outperforming zero-shot alternatives by 48% and 118% in proximity and success metrics respectively. A human evaluation study further revealed that while task-specific algorithms might achieve higher numerical performance, FB-CPR was consistently perceived as more “human-like”, with participants rating its behaviors as more natural in 83% of reward-based tasks and 69% of goal-reaching scenarios.

This research introduced FB-CPR, a unique algorithm that combines zero-shot properties of forward-backward models with innovative regularization techniques for policy learning using unlabeled behavior datasets. By training the first behavioral foundation model for complex humanoid agent control, the method demonstrated state-of-the-art performance across diverse tasks. Despite its significant achievements, the approach has notable limitations. FB-CPR struggles with tasks far removed from motion-capture datasets and occasionally produces imperfect movements, particularly in scenarios involving falling or standing. The current model is restricted to proprioceptive observations and cannot navigate environments or interact with objects. Future research directions include integrating additional state variables, exploring complex perception methods, utilizing video-based human activity datasets, and developing more direct language-policy alignment techniques to expand the model’s capabilities and generalizability.


Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

Trending: LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Frontier AI-level Models Delivering Unmatched Instruction Following and Long Context Understanding for Global Leadership in Generative AI Excellence….

The post Meta FAIR Releases Meta Motivo: A New Behavioral Foundation Model for Controlling Virtual Physics-based Humanoid Agents for a Wide Range of Complex Whole-Body Tasks appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

行为基础模型 人形控制 零样本学习 FB-CPR META MOTIVO
相关文章