MarkTechPost@AI 13小时前
PoE-World + Planner Outperforms Reinforcement Learning RL Baselines in Montezuma’s Revenge with Minimal Demonstration Data
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

PoE-World 是一种新型的 AI 世界建模方法,它结合了模块化和概率结构,利用大型语言模型生成小的、可解释的程序,每个程序代表环境中的特定规则。这种方法允许 AI 代理从少量演示数据中学习,并在复杂环境中有效规划。研究在 Atari 游戏如 Pong 和 Montezuma's Revenge 上进行了测试,结果表明,PoE-World 在低数据设置下优于现有的强化学习基线模型,展现出强大的泛化能力和高效的决策能力。

🤖 PoE-World 采用模块化设计,将环境分解为多个小的、可解释的程序,每个程序由大型语言模型合成,代表环境中的特定规则或行为。

🧩 该模型通过组合这些“专家”程序来预测未来的状态,支持从少量演示数据中学习,并在新情况下进行泛化,从而实现高效的规划和决策。

🎮 在 Atari 游戏 (如 Pong 和 Montezuma's Revenge) 的测试中,PoE-World 表现在低数据设置下优于 PPO、ReAct 和 WorldCoder 等基线模型,尤其是在 Montezuma's Revenge 中表现出色。

💡 PoE-World 能够准确地模拟游戏动态,即使在修改后的环境中也能保持高性能,这得益于其详细的、约束感知的表示,从而实现更好的规划和更逼真的游戏内行为。

The Importance of Symbolic Reasoning in World Modeling

Understanding how the world works is key to creating AI agents that can adapt to complex situations. While neural network-based models, such as Dreamer, offer flexibility, they require massive amounts of data to learn effectively, far more than humans typically do. On the other hand, newer methods use program synthesis with large language models to generate code-based world models. These are more data-efficient and can generalize well from limited input. However, their use has been mostly limited to simple domains, such as text or grid worlds, as scaling to complex, dynamic environments remains a challenge due to the difficulty of generating large, comprehensive programs.

Limitations of Existing Programmatic World Models

Recent research has investigated the use of programs to represent world models, often leveraging large language models to synthesize Python transition functions. Approaches like WorldCoder and CodeWorldModels generate a single, large program, which limits their scalability in complex environments and their ability to handle uncertainty and partial observability. Some studies focus on high-level symbolic models for robotic planning by integrating visual input with abstract reasoning. Earlier efforts employed restricted domain-specific languages tailored to specific benchmarks or utilized conceptually related structures, such as factor graphs in Schema Networks. Theoretical models, such as AIXI, also explore world modeling using Turing machines and history-based representations.

Introducing PoE-World: Modular and Probabilistic World Models

Researchers from Cornell, Cambridge, The Alan Turing Institute, and Dalhousie University introduce PoE-World, an approach to learning symbolic world models by combining many small, LLM-synthesized programs, each capturing a specific rule of the environment. Instead of creating one large program, PoE-World builds a modular, probabilistic structure that can learn from brief demonstrations. This setup supports generalization to new situations, allowing agents to plan effectively, even in complex games like Pong and Montezuma’s Revenge. While it doesn’t model raw pixel data, it learns from symbolic object observations and emphasizes accurate modeling over exploration for efficient decision-making.

Architecture and Learning Mechanism of PoE-World

PoE-World models the environment as a combination of small, interpretable Python programs called programmatic experts, each responsible for a specific rule or behavior. These experts are weighted and combined to predict future states based on past observations and actions. By treating features as conditionally independent and learning from the full history, the model remains modular and scalable. Hard constraints refine predictions, and experts are updated or pruned as new data is collected. The model supports planning and reinforcement learning by simulating likely future outcomes, enabling efficient decision-making. Programs are synthesized using LLMs and interpreted probabilistically, with expert weights optimized via gradient descent.

Empirical Evaluation on Atari Games

The study evaluates their agent, PoE-World + Planner, on Atari’s Pong and Montezuma’s Revenge, including harder, modified versions of these games. Using minimal demonstration data, their method outperforms baselines such as PPO, ReAct, and WorldCoder, particularly in low-data settings. PoE-World demonstrates strong generalization by accurately modeling game dynamics, even in altered environments without new demonstrations. It’s also the only method to consistently score positively in Montezuma’s Revenge. Pre-training policies in PoE-World’s simulated environment accelerate real-world learning. Unlike WorldCoder’s limited and sometimes inaccurate models, PoE-World produces more detailed, constraint-aware representations, leading to better planning and more realistic in-game behavior.

Conclusion: Symbolic, Modular Programs for Scalable AI Planning

In conclusion, understanding how the world works is crucial to building adaptive AI agents; however, traditional deep learning models require large datasets and struggle to update flexibly with limited input. Inspired by how humans and symbolic systems recombine knowledge, the study proposes PoE-World. This method utilizes large language models to synthesize modular, programmatic “experts” that represent different parts of the world. These experts combine compositionally to form a symbolic, interpretable world model that supports strong generalization from minimal data. Tested on Atari games like Pong and Montezuma’s Revenge, this approach demonstrates efficient planning and performance, even in unfamiliar scenarios. Code and demos are publicly available.


Check out the Paper, Project Page and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post PoE-World + Planner Outperforms Reinforcement Learning RL Baselines in Montezuma’s Revenge with Minimal Demonstration Data appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

PoE-World 世界建模 程序合成 AI规划
相关文章