cs.AI updates on arXiv.org 07月29日 12:21
Minding Motivation: The Effect of Intrinsic Motivation on Agent Behaviors
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文通过实证评估三种内驱力方法对MiniGrid游戏环境内强化学习行为的影响,并与广义奖励匹配方法进行比较,发现内驱力方法通过增加初始奖励和改变代理行为产生影响,广义奖励匹配方法在一定程度上缓解了奖励破解问题。

arXiv:2507.19725v1 Announce Type: new Abstract: Games are challenging for Reinforcement Learning~(RL) agents due to their reward-sparsity, as rewards are only obtainable after long sequences of deliberate actions. Intrinsic Motivation~(IM) methods -- which introduce exploration rewards -- are an effective solution to reward-sparsity. However, IM also causes an issue known as `reward hacking' where the agent optimizes for the new reward at the expense of properly playing the game. The larger problem is that reward hacking itself is largely unknown; there is no answer to whether, and to what extent, IM rewards change the behavior of RL agents. This study takes a first step by empirically evaluating the impact on behavior of three IM techniques on the MiniGrid game-like environment. We compare these IM models with Generalized Reward Matching~(GRM), a method that can be used with any intrinsic reward function to guarantee optimality. Our results suggest that IM causes noticeable change by increasing the initial rewards, but also altering the way the agent plays; and that GRM mitigated reward hacking in some scenarios.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

内驱力方法 强化学习 奖励破解 MiniGrid游戏 广义奖励匹配
相关文章