MarkTechPost@AI 2024年10月10日
Generative World Models for Enhanced Multi-Agent Decision-Making
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章介绍了一种用于增强多智能体决策的生成式世界模型。该模型旨在解决现有生成模型在复杂多智能体决策场景中的不足,通过模拟经验来提升决策质量,包括语言引导的模拟器、因果转换器等组成部分,并在实际应用中取得了良好效果,还总结了主要贡献。

🎯生成式世界模型旨在克服现有生成模型在多智能体决策场景中的局限性,在该框架中引入语言引导的模拟器,以增强决策过程,通过模拟体验提高生成解决方案的质量。

🌐该模型的动力学模型由因果转换器和图像标记器组成,因果转换器以自回归方式创建交互转换,图像标记器将视觉输入转换为可分析的结构化格式,以模拟智能体随时间的交互。

💰奖励模型中使用双向变压器,通过优化专家演示的概率进行训练,以普通语言任务描述为指导,使模型能够将特定活动与奖励联系起来。

📈在实际应用中,世界模型可根据环境图像和任务描述模拟智能体交互并生成一系列图像,展示交互结果,用于训练控制智能体行为的策略,直到其收敛。

👍该方法的主要优势包括能够产生一致的交互序列,使模型在模拟智能体交互时生成逻辑连贯的结果,从而实现更可靠的决策,且能清晰解释特定行为获得奖励的原因。

Recent developments in generative models have paved the way for innovations in chatbots and picture production, among other areas. These models have demonstrated remarkable performance across a range of tasks, but they frequently falter when faced with intricate, multi-agent decision-making scenarios. This issue is mostly due to generative models’ incapacity to learn by trial and error, which is an essential component of human cognition. Rather than actually experiencing circumstances, they mainly rely on pre-existing facts, which results in inadequate or inaccurate solutions in increasingly complex settings.

A unique method has been developed to overcome this limitation, including a language-guided simulator in the multi-agent reinforcement learning (MARL) framework. This paradigm seeks to enhance the decision-making process through simulated experiences, hence improving the quality of the generated solutions. The simulator functions as a world model that can pick up on two essential concepts: reward and dynamics. While the reward model assesses the results of those acts, the dynamics model forecasts how the environment will change in response to various activities.

A causal transformer and an image tokenizer make up the dynamics model. The causal transformer creates interaction transitions in an autoregressive way, while the picture tokenizer transforms visual input into a structured format that the model can analyze. In order to simulate how agents interact over time, the model predicts each step in the interaction sequence based on steps that have come before it. Conversely, a bidirectional transformer has been used in the reward model. The training process for this component involves optimizing the probability of expert demonstrations, which serve as training examples of optimal behavior. The reward model gains the ability to link particular activities to rewards by using plain-language task descriptions as a guide.

In practical terms, the world model may simulate agent interactions and produce a series of images that depict the result of those interactions when given an image of the environment as it is at that moment and a task description. The world model is used to train the policy, which controls the agents’ behavior, until it converges, indicating that it has discovered an efficient method for the given job. The model’s solution to the decision-making problem is the resulting image sequence, which visually depicts the task’s progression.

According to empirical findings, this paradigm considerably enhances the quality of solutions for multi-agent decision-making issues. It has been evaluated on the well-known StarCraft Multi-Agent Challenge benchmark, which is used to assess MARL systems. The framework works well on activities it was trained on and also did a good job of generalizing to new, untrained tasks. 

One of this approach’s main advantages is its capacity to produce consistent interaction sequences. This indicates that the model generates logical and coherent results when it imitates agent interactions, resulting in more trustworthy decision-making. Furthermore, the model can clearly explain why particular behaviors were rewarded, which is essential for comprehending and enhancing the decision-making process. This is because the reward functions are explicable at each interaction stage.

The team has summarized their primary contributions as follows,

    New MARL Datasets for SMAC: Based on a given state, a parser automatically generates ground-truth images and task descriptions for the StarCraft Multi-Agent Challenge (SMAC). This work has presented new datasets for SMAC.
    The study has introduced Learning before Interaction (LBI), an interactive simulator that improves multi-agent decision-making by generating high-quality answers through trial-and-error experiences.
    Superior Performance: Based on empirical findings, LBI performs better on training and unseen tasks than different offline learning techniques. The model provides Transparency in decision-making, which creates consistent imagined paths and offers explicable rewards for every interaction state.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit

[Upcoming Event- Oct 17 202] RetrieveX – The GenAI Data Retrieval Conference (Promoted)

The post Generative World Models for Enhanced Multi-Agent Decision-Making appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

生成式世界模型 多智能体决策 强化学习 模拟体验 决策质量
相关文章