cs.AI updates on arXiv.org 19小时前
High-Throughput Distributed Reinforcement Learning via Adaptive Policy Synchronization
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文提出ClusterEnv,一种轻量级、学习无关的分布式环境执行接口,通过分离模拟和训练逻辑提高强化学习工作负载的可扩展性。它引入了DETACH模式,并提出了自适应演员策略同步(AAPS)机制,以实现高效样本效率和低同步开销。

arXiv:2507.10990v1 Announce Type: cross Abstract: Scaling reinforcement learning (RL) workloads often requires distributing environment simulation across compute clusters. Existing frameworks entangle simulation, learning logic, and orchestration into monolithic systems, limiting modularity and reusability. We present ClusterEnv, a lightweight, learner-agnostic interface for distributed environment execution that mirrors the Gymnasium API. ClusterEnv introduces the DETACH pattern, which decouples simulation from training by offloading reset() and step() operations to remote workers while keeping learning centralized. To address policy staleness in distributed execution, we propose Adaptive Actor Policy Synchronization (AAPS), a divergence-triggered update mechanism that reduces synchronization overhead without sacrificing performance. ClusterEnv integrates cleanly into existing RL pipelines, supports both on-policy and off-policy methods, and requires minimal code changes. Experiments on discrete control tasks demonstrate that AAPS achieves high sample efficiency with significantly fewer weight updates. Source code is available at https://github.com/rodlaf/ClusterEnv.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

分布式强化学习 ClusterEnv AAPS 环境执行 强化学习框架
相关文章