Turning Sand to Gold: Recycling Data to Bridge On-Policy and Off-Policy Learning via Causal Bound

cs.AI updates on arXiv.org 20小时前

Turning Sand to Gold: Recycling Data to Bridge On-Policy and Off-Policy Learning via Causal Bound

本文提出将Neyman-Rubin框架应用于DRL，通过计算事实损失因果界，有效利用废弃数据，提升DQN和SAC等模型在Atari 2600和MuJoCo领域的效率，实验结果显示奖励比最高提升2427%，经验回放缓冲区大小减少96%。

arXiv:2507.11269v1 Announce Type: cross Abstract: Deep reinforcement learning (DRL) agents excel in solving complex decision-making tasks across various domains. However, they often require a substantial number of training steps and a vast experience replay buffer, leading to significant computational and resource demands. To address these challenges, we introduce a novel theoretical result that leverages the Neyman-Rubin potential outcomes framework into DRL. Unlike most methods that focus on bounding the counterfactual loss, we establish a causal bound on the factual loss, which is analogous to the on-policy loss in DRL. This bound is computed by storing past value network outputs in the experience replay buffer, effectively utilizing data that is usually discarded. Extensive experiments across the Atari 2600 and MuJoCo domains on various agents, such as DQN and SAC, achieve up to 2,427% higher reward ratio, outperforming the same agents without our proposed term, and reducing the experience replay buffer size by up to 96%, significantly improving sample efficiency at negligible cost.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

深度强化学习 Neyman-Rubin框架样本效率

相关文章

Controlling Fusion Reactor Instability with Deep Reinforcement Learning with Aza Jalalvand - #682

Trends in Deep Reinforcement Learning with Kamyar Azizzadenesheli - #560

Deep Reinforcement Learning at the Edge of the Statistical Precipice with Rishabh Agarwal - #559

Trends in Reinforcement Learning with Pablo Samuel Castro - #443

Relational, Object-Centric Agents for Completing Simulated Household Tasks with Wilka Carvalho - #402

Safer Exploration in Deep Reinforcement Learning using Action Priors with Sicelukwanda Zwane - TWiML Talk #235

Trends in Reinforcement Learning with Simon Osindero - TWiML Talk #217

Deep Reinforcement Learning Primer and Research Frontiers with Kamyar Azizzadenesheli - TWiML Talk #177

OpenAI Five with Christy Dennison - TWiML Talk #176

Generative AI that imitates human motion