少点错误 06月27日 07:07
The Bellman equation does not apply to bounded rationality
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了理性代理的概念,区分了无界理性和有界理性。无界理性代理能够基于最高的期望效用做出决策,而有界理性代理则受到计算资源和算法的限制。文章通过一个例子,说明在有限计算资源下,最优策略可能并非总是选择最佳行动,并强调了在评估理性时需要考虑整个策略而非单个行动。文章还通过一个猜测SHA-3前像中“e”的数量的例子,说明了在实际问题中,由于计算限制,简单的策略可能更接近最优策略。

🧠 **无界理性 vs. 有界理性:** 无界理性代理基于最高的期望效用做出决策,而有界理性代理的决策受到计算资源的限制。这意味着在有界理性情况下,选择最佳行动的策略可能不可行。

💡 **策略的重要性:** 在有界理性中,评估一个行动是否理性是不够的,需要评估整个策略。这是因为计算资源限制可能导致无法总是选择最佳行动。

🤔 **猜测前像的例子:** 给定一个随机的512位,猜测SHA-3前像中“e”的数量。虽然可以根据位数推断,但由于计算限制,最优策略可能不会硬编码特定输入,例如文中提到的b12ee4ed50edf33e4a388924f35f0190da54ee82ca0338d7dcc4adc3214da21e69d0b0b32789074fef3c05f02d6814da2c8d72057f50835d8f83265e6a4c3b57。

⚖️ **计算限制下的最优策略:** 由于计算限制,一个简单的策略,例如猜测所有输入中“e”的数量为32,可能比试图硬编码特定输入的策略更接近最优。

Published on June 26, 2025 11:01 PM GMT

Quick quiz: which of the following is the definition of a rational agent?

    An agent who follows the policy with the highest expected utilityAn agent who follows the policy of choosing the action with the highest expected utility

Are those even different things? In the case of unbounded rationality, it is a bit of a trick question. Usually (1) is taken as the definition of rational, but according to the Bellman equation, the optimal policy is the one that chooses an optimal action in every state. Seems obvious, right?

Bounded rationality is a form of rationality where the policy must given by an algorithm, and the computational resources used by that algorithm matters. For example, if computation produces heat, than when it is (just for example) extremely unreasonably hot out, the agent might do less computation, or use a computer to do computations for it. (An unbounded rational agent never uses a computer, since it can just simulate computers "in its head".)

In bounded rationality, (1) and (2) are different, and (2) isn't even well-defined (there might not be a policy that can always choose the best action, and even if there is that action might depend on the amount of computation that has been done).

So, we cannot say whether a given action is bounded rational, only whether the entire policy is bounded rational.

Example: Guessing a preimage

For example, consider the following problem: given a random 512 bits, guess how many es are in the SHA-3 preimage of those bits.

In the specific case of 3fbef366d350d95ec6ef948dfc2fd1ebd6f1de928c266d86e2ed8e408b2f7b30cad0e14714830369f96ad41c2b58da80bab3bff90a6770910244e28b4f9e80be, you might be tempted to say "32 es", since you know nothing about that string and 512/16 is 32.

However, a better guess is 10, since the preimage is b12ee4ed50edf33e4a388924f35f0190da54ee82ca0338d7dcc4adc3214da21e69d0b0b32789074fef3c05f02d6814da2c8d72057f50835d8f83265e6a4c3b57, which has exactly 10 es.

However, assuming reasonable computational limits, it is unlikely that the optimal policy says 10. b12ee4ed50edf33e4a388924f35f0190da54ee82ca0338d7dcc4adc3214da21e69d0b0b32789074fef3c05f02d6814da2c8d72057f50835d8f83265e6a4c3b57 is just a random string of bits I generated, and for any given random strings of bits, it is unlikely for any given policy (including the optimal policy) to have hard-coded it. There are simply too many possible inputs to hard-code a significant fraction of them! I am guessing that a policy that says 32 es on every input, including b12ee4ed50edf33e4a388924f35f0190da54ee82ca0338d7dcc4adc3214da21e69d0b0b32789074fef3c05f02d6814da2c8d72057f50835d8f83265e6a4c3b57 in particular, is close to optimal.

(Actually formalizing this is too hard for me left as an exercise to the reader. If you do so, I suggest going with a random oracle as a stand-in for SHA-3.)



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

有界理性 无界理性 理性代理 决策
相关文章