MarkTechPost@AI 01月03日
Mixture-of-Denoising Experts (MoDE): A Novel Generalist MoE-based Diffusion Policy
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

MoDE是一种新型的混合专家(MoE)扩散策略,专为模仿学习和机器人设计等任务而设计。它通过噪声条件路由和自注意力机制,在不同噪声水平下实现更有效的去噪,从而提高效率。与传统方法不同,MoDE仅计算并集成每个噪声水平的必要专家,减少了延迟和计算成本。MoDE在保持性能的同时,实现了更快更高效的推理,并且通过在每个前向传递期间仅使用模型参数的子集来显著节省计算资源。实验表明,MoDE在多个基准测试中优于其他扩散策略,并在零样本泛化任务中表现出强大的泛化能力。

💡MoDE采用噪声条件路由,根据每个步骤的噪声水平确定专家的路由,并利用冻结的CLIP语言编码器进行语言条件处理,以及FiLM条件ResNets进行图像编码,从而提高效率。

⚙️MoDE模型包含一系列Transformer模块,每个模块负责不同的去噪阶段,通过引入噪声感知位置嵌入和专家缓存,确保仅使用必要的专家,从而减少计算开销。

🚀实验结果表明,MoDE在LIBERO-90等基准测试中取得了最高性能,超越了其他模型,如Diffusion Transformer和QueST。预训练进一步提升了MoDE的性能,展示了其在学习长时程任务方面的能力和计算效率。

🎯MoDE在CALVIN语言技能基准测试中也表现出色,超越了RoboFlamingo和GR-1等模型,同时保持了更高的计算效率,并且在零样本泛化任务中优于所有基线,展现了强大的泛化能力。

Diffusion Policies in Imitation Learning (IL) can generate diverse agent behaviors, but as models grow in size and capability, their computational demands increase, leading to slower training and inference. This challenges real-time applications, especially in environments with limited computing power, like mobile robots. These policies need many parameters and denoising steps and, thus, are unsuitable for use in such scenarios. Although such models can be scaled with greater amounts of data, their large computational cost poses a significant limitation.

Current methods in robotics, such as Transformer-based Diffusion Models, are used for tasks like Imitation Learning, Offline Reinforcement Learning, and robot design. These models rely on Convolutional Neural Networks (CNNs) or transformers with conditioning techniques like FiLM. While capable of generating multimodal behavior, they are computationally expensive due to large parameters and many denoising steps, slowing training and inference, making them impractical for real-time applications. Additionally, Mixture-of-Experts (MoE) models face issues like expert collapse and inefficient capacity use. Despite load-balancing solutions, these models struggle to optimize the router and experts, leading to suboptimal performance.

To address the limitations of current methods, researchers from the Karlsruhe Institute of Technology and MIT introduced MoDE, a Mixture-of-Experts (MoE) Diffusion Policy designed for tasks such as Imitation Learning and robot design. MoDE improves efficiency by using noise-conditioned routing and a self-attention mechanism for more effective denoising at various noise levels. Unlike traditional methods that rely on a complex denoising process, MoDE computes and integrates only the necessary experts at each noise level, reducing latency and computational cost. This architecture enables faster and more efficient inference while maintaining performance, achieving significant computational savings by utilizing only a subset of the model’s parameters during each forward pass.

The MoDE framework employs a noise-conditioned approach where the routing of experts is determined by the noise level at each step. It uses a frozen CLIP language encoder for language conditioning and FiLM-conditioned ResNets for image encoding. The model incorporates a sequence of transformer blocks, each responsible for different denoising phases. By introducing noise-aware positional embeddings and expert caching, MoDE ensures that only the necessary experts are used, reducing computational overhead. The researchers conducted extensive analyses of MoDE’s components, which provide useful insights for designing efficient and scalable transformer architectures for diffusion policies. Pretraining on diverse multi-robot datasets allows MoDE to outperform existing generalist policies.

Researchers conducted experiments to evaluate MoDE on several key questions, including its performance compared to other policies and diffusion transformer architectures, the effect of large-scale pretraining on its performance, efficiency, and speed, and the effectiveness of token routing strategies in different environments. The experiments compared MoDE with prior diffusion transformer architectures, ensuring fairness by using a similar number of active parameters, and tested it on both long-horizon and short-horizon tasks. MoDE achieved the highest performance in benchmarks such as LIBERO90, outperforming other models like Diffusion Transformer and QueST. Pretraining MoDE boosted its performance, demonstrating its ability to learn long-horizon tasks and its efficiency in computational use. MoDE also showed superior performance on the CALVIN Language-Skills Benchmark, surpassing models like RoboFlamingo and GR-1 while maintaining higher computational efficiency. MoDE outperformed all baselines in zero-shot generalization tasks and demonstrated strong generalization capabilities.

In conclusion, the proposed framework improves performance and efficiency using a combination of experts, a Transformer, and a noise-conditioned routing strategy. The model outperformed previous Diffusion Policies, requiring fewer parameters and reduced computational costs. Therefore, this framework can be used as a baseline to improve the model’s scalability in future research studies. Future studies can also discuss the application of MoDE across other domains because it has thus far been possible to continue scaling up while maintaining high-performance levels in machine learning tasks.


Check out the Paper and Model on Hugging Face. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

FREE UPCOMING AI WEBINAR (JAN 15, 2025): Boost LLM Accuracy with Synthetic Data and Evaluation IntelligenceJoin this webinar to gain actionable insights into boosting LLM model performance and accuracy while safeguarding data privacy.

The post Mixture-of-Denoising Experts (MoDE): A Novel Generalist MoE-based Diffusion Policy appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

MoDE 扩散策略 混合专家 模仿学习 机器人设计
相关文章