少点错误 03月28日
[Linkpost] The value of initiating a pursuit in temporal decision-making
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了时间决策中的最优策略,以及个体行为与最优策略的偏差。研究指出,时间折扣函数源于时间变化的回报概率。通过分析追求的启动价值,作者提出了一个框架,将决策启发式方法解释为关键参数的(错误)估计。研究揭示了时间成本的组成部分,并展示了奖励率最优代理的时间折扣函数如何受到环境时间结构的影响。此外,研究还分析了参数错误估计对人类和动物行为的影响,提出“时间分配不当假说”,为理解和量化时间决策中的错误模式提供了关键。

🤔研究的核心在于,将奖励最大化作为行为生态学、神经科学、经济学和人工智能的规范性原则。文章旨在识别和比较用于评估启动追求价值的方程,从而实现奖励最大化。

💡文章区分了两种时间决策类别:放弃和选择决策。作者在此基础上概括并分析了最大化奖励率的最优解,从而评估追求的价值。

⏳研究表明,时间成本由两部分组成:分配成本和机会成本。奖励率最优代理的时间折扣函数不仅取决于所考虑的追求的属性,还取决于在追求之外花费的时间和获得的奖励。

🔍研究进一步指出,人类和动物表现出的非最优行为(如双曲线折扣、延迟效应、幅度效应、符号效应)实际上与奖励率最大化一致。行为中的错误可能源于对时间分配的错误评估,即低估了在追求之外的时间。

🧐研究提出了“时间分配不当假说”,认为低估在追求之外的时间是导致非最优时间决策行为的可能原因。这种理解有助于更深入地分析人类和动物在评估追求价值时所使用的学习算法和表征结构。

Published on March 27, 2025 9:47 PM GMT

This eLife paper The value of initiating a pursuit in temporal decision-making by Elissa Sutlief, Charlie Walters, Tanya Marton, and Marshall G Hussain Shuler, seems to dissolve the question of the choice of temporal discount functions by explaining how it results from time-varying reward probabilities. It provides a framework to interpret decision heuristics as (mis)estimates of key parameters in this framework.

Abstract:

Reward-rate maximization is a prominent normative principle commonly held in behavioral ecology, neuroscience, economics, and artificial intelligence. Here, we identify and compare equations for evaluating the worth of initiating pursuits that an agent could implement to enable reward-rate maximization. We identify two fundamental temporal decision-making categories requiring the valuation of the initiation of a pursuit—forgo and choice decision-making—over which we generalize and analyze the optimal solution for how to evaluate a pursuit in order to maximize reward rate. From this reward-rate-maximizing formulation, we derive expressions for the subjective value of a pursuit, i.e. that pursuit’s equivalent immediate reward magnitude, and reveal that time’s cost is composed of an apportionment, in addition to, an opportunity cost. By re-expressing subjective value as a temporal discounting function, we show precisely how the temporal discounting function of a reward-rate-optimal agent is sensitive not just to the properties of a considered pursuit, but to the time spent and reward acquired outside of the pursuit for every instance spent within it. In doing so, we demonstrate how the apparent discounting function of a reward-rate-optimizing agent depends on the temporal structure of the environment and is a combination of hyperbolic and linear components, whose contributions relate the apportionment and opportunity cost of time, respectively. We further then show how purported signs of suboptimal behavior (hyperbolic discounting, the Delay effect, the Magnitude effect, the Sign effect) are in fact consistent with reward-rate maximization. Having clarified what features are and are not signs of optimal decision-making, we analyze the impact of the misestimation of reward rate-maximizing parameters in order to better account for the pattern of errors actually observed in humans and animals. We find that error in agents’ assessment of the apportionment of time that underweights the time spent outside versus inside a considered pursuit type is the likely driver of suboptimal temporal decision-making observed behaviorally. We term this the Malapportionment Hypothesis. This generalized form for reward-rate maximization and its relation to subjective value and temporal discounting allows the true pattern of errors exhibited by humans and animals to be more deeply understood, identified, and quantified, which is key to deducing the learning algorithms and representational architectures actually used by humans and animals to evaluate the worth of pursuits.

 

Patterns of temporal decision-making in Choice and Forgo situations deviate from optimal (top row) under various parameter misestimations (subsequent rows). Characterization of the nature of suboptimality is aided by the use of the outside reward rate as the independent variable influencing decision-making (x-axis), plotted against the degree of error (y-axis) of a given parameter (ω<1 underestimation, ω=1 actual, ω>1 overestimation). The leftmost column provides a schematic exemplifying true outside (gold) and inside (blue) pursuit parameters and the nature of parameter error (dashed red) investigated per row (all showing an instance of underestimation).


Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

时间决策 奖励最大化 行为经济学 时间折扣
相关文章