少点错误 2024年08月10日
Simulation-aware causal decision theory: A case for one-boxing in CDT
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨CDTagents在涉及可靠预测者的情境中可能的行为,提出其可能是预测者运行的模拟,这会影响决策,如在Newcomb问题和反事实抢劫中

🎯CDTagents在某些情境中可能被视为预测者运行的模拟,这使他们的选择对'真实'预测产生因果影响,如在Newcomb问题中,若认为自己是模拟,其决策会有所不同,需考虑对现实世界中自身的预期效用最大化。

💡以Newcomb问题为例,透明盒中有1000美元,不透明盒中可能有1000000美元,若代理人有50%认为自己是模拟,计算两盒都拿和只拿一盒的预期价值,结果是CDTagent会选择只拿一盒。

🤔在反事实抢劫中,用100美元交换反事实的1000000美元,在相同假设下,计算CDT预期价值,结果是CDTagent会同意抢劫。

Published on August 9, 2024 6:09 PM GMT

Disclaimer: I am a math student new to LW and the rationalist community. I could not find this idea anywhere else after a modest amount of research; I apologize if this has been brought up before.

Summary: CDT agents may behave differently than expected in situations involving reliable predictors due to the possibility that they are themselves simulations being run by the predictor. This may lead to decisions such as one-boxing in Newcomb's problem and acquiescing in counterfactual mugging.

The traditionally held conclusion of CDT in Newcomb's problem is that the decision of which boxes to take cannot causally influence the predictor's prediction, so in the absence of any evidence as to what the prediction is, a CDT agent should take both boxes as it would be better off with that choice than with the choice to take only the opaque box regardless of what the prediction is. The potential challenge I pose to this is that in such a situation, a rational agent ought to have significant credence in the possibility that it is a simulation being run by the predictor, and that in the case that the agent is a simulation, its choice actually does have a causal influence on the "real" prediction.

Fundamental to this argument is the claim that a reliable predictor is almost certainly a reliable simulator. To see why this is plausible, consider that if a predictor was not able to predict any particular internal process of the agent, that process could be abused to generate decisions that thwart the predictor. For a rough example, think of a human-like agent deciding "at random" whether to one-box or two-box. Human pseudorandom bit generation is far from a coin flip, but it depends on enough internal state that a predictor unable to simulate most, if not all, of the brain would likely not perform well enough to warrant being called reliable.

We must also make the assumption that an agent in this situation would, if it knew it was a simulation, strive to maximize expected utility for its real-world counterpart instead of its simulated self. I'm not entirely sure if this is a reasonable assumption, and I admit this may be a failure point of the theory, but I'll continue anyway. At the very least, it seems like the kind of behavior you would want to build into a utility-maximizing agent if you were to design one.

Now, for a concrete example of a decision problem where this observation is consequential, consider Newcomb's problem with $1,000 in the transparent box and $1,000,000 possibly in the opaque box. Assume that the agent has 50% credence that it is a simulation. In the case that the agent is a simulation, it does not have causal influence over its decision in the real world and so it assumes a uniform prior over the two possibilities (its real-world counterpart one-boxing or two-boxing). In the case that the agent is in the real world, it assumes a uniform prior over the presence of the $1,000,000 in the opaque box. Given these assumptions, the expected value of two-boxing is

,

(where the first top-level term corresponds to the simulation, the second to reality, the first parenthesized terms to the other agent (real or sim) two-boxing, and the second to the other agent one-boxing) whereas the expected value of one-boxing is

,

and hence a CDT agent with these assumptions would choose to one-box.

For the second example, consider a counterfactual mugging of $100 in exchange for a counterfactual $1,000,000 (to simplify the calculations, assume that the predictor would only go through the effort of making a prediction if the coin landed heads). Under the same assumptions as above, the CDT expected value of acquiescing to the mugging is

,

and the expected value of refusing the mugging is

,

hence a CDT agent would acquiesce.


This idea relies on several assumptions, some of which are more reasonable than others. I don't claim that this is how a pure CDT agent would actually behave under these scenarios, but it's at least an interesting alternative to think about to explain why the common interpretation of CDT appears irrational (or at least destitute in practice) when it comes to decision problems involving predictors, despite giving reasonable verdicts in most others. In the future, it would be interesting to investigate to what extent this version of CDT agrees with other decision theories, and on what problems it diverges.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

CDTagents 可靠预测者 决策问题
相关文章