少点错误 04月11日
Weird Random Newcomb Problem
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了新康姆问题的一个变体,其中程序需要根据输入决定是“拿一个盒子”还是“拿两个盒子”。文章分析了程序和程序员在不同视角下的最优决策,揭示了在特定情境下,不同决策理论框架(如“普遍预先承诺”、“功能性”和“人择”)可能给出相互矛盾的建议。作者认为,这种矛盾挑战了对通用决策理论的理解,并引发了对UDT(通用决策理论)的重新思考。

🤔 新康姆问题变体中,Omega随机选择一个程序a,运行a并根据其输出决定是否在第一个盒子里放入100万美元。同时,程序b(也属于P)会根据输入(n(b), n(a))决定拿取一个还是两个盒子。

❓ 关键问题:如果程序b的输入是(x, x),即两个数字相等,应该输出什么以最大化收益?如果作为程序员,又该如何设计程序b?

💡 程序员视角下的最优解是编写一个始终拿取两个盒子的程序,因为这能确保获得额外的1000美元收益。

🤯 然而,从程序b的角度来看,似乎应该选择拿取一个盒子,这与标准的新康姆问题类似,但这种选择与程序员的最优策略相悖。

🧐 文章指出,这种冲突挑战了不同决策理论框架(如“普遍预先承诺”、“功能性”和“人择”)的一致性,并质疑了通用决策理论的有效性。

Published on April 11, 2025 1:09 PM GMT

Epistemic status: I'm pretty sure the problem is somewhat interesting, because it temporarily confused several smart people. I'm not at all sure that it is very original; probably somebody has already thought about something similar. I'm not at all sure that I have actually found a flaw in UDT, but I somewhat expect that a discussion of this problem may clarify UDT for some people.

This post emerged from my work in the "Deconfusing Commitment Races" project under the Supervised Program for Alignment Research (SPAR), led by James Faville. I'm grateful to SPAR for providing the intellectual environment and to James Faville personally for intellectual discussions and help with the draft of this post. Any mistakes are my own.

I used Claude and Gemini to help me with phrasing and grammar in some parts of this post.

Scenario

Let  be the set of all programs in a fixed programming language that satisfy the following conditions:

 is a finite, albeit extremely large, set.

Let  be some fixed bijective numbering of programs from .

Omega presents you a variation of Newcomb Problem.

However, the program that decides whether you get the money from one or from both boxes is not (necessarily) . Let's call this program . Program  is also an element of . It receives the pair  as input – its own number according to the numbering , and the number of the program  that Omega randomly selected. Based on this input,  must output either "take 1 box" or "take 2 boxes".

Questions

Question 1: Assume you are program . You want to maximize the money you receive. What should you output if your input is  (i.e., the two numbers are equal)?

Question 2: Assume you are the programmer writing program . You want to maximize the expected money program  receives. How should you design b to behave when it receives an input ?

(Feel free to pause and consider these questions before reading further.)

-

-

-

-

Question 1 appears analogous to the standard Newcomb Problem. Omega ran your own code  (acting as ) on the same input you received to determine whether to place the $1,000,000 in the first box. So it seems you should take one box.

But in Question 2 it's better to write the program which always takes 2 boxes! Consider the programmer choosing between implementing b as one of two specific programs:

 gets additional $1,000  doesn't get when , and gets the same payoff as  in all other cases. The probability for  to be any specific program is independent from . So  is strictly better than 

So, if you are the program, you prefer to choose one action. But if you are the programmer who writes this program, you prefer it to choose another action in the same circumstances.

Isn't it normal?

At first glance there are many problems like this. Justifications of advanced decision theories often use problems with this property. Usually their discussion ends with something like "...and that's why you should follow the optimal policy even if you didn't explicitly precommit to it beforehand". It follows the argument in one of the following framings:

"Universal precommitment" framing: You prefer to have an optimal policy. Sometimes the optimal policy includes locally non-optimal decisions (e.g., if someone predicts your policy). So you would like to make a precommitment for such cases. You can't think about all possible situations in advance, so it's better to make a precommitment "I will follow any precommitment which would be a good idea to make in advance". It would be a good idea to precommit to take one box in the normal Newcomb Problem, to pay the driver in Parfit's Hitchhiker, to pay Omega in Counterfactual Mugging. So you do it.

"Functional" framing: You control the input-output behavior of your decision-making function. This function can be instantiated not only in you, but in some other places, e.g., in someone who predicts your behavior. You necessarily control all instances at the same time. You prefer the instances in predictions to take one box/to pay the driver/to pay Omega. So you do it. 

"Anthropic" framing: You actually don't know if you are the "real you" or the simulation of yourself in the good prediction (otherwise it wouldn't be good). So normal causal expected utility calculations tell you it's better to take one box (it's -$1K if you are "real you", but +$1M if you are a simulation, and chances are 50-50)/to pay the driver (-$100 and +life, chances of being a simulation are at least 50%)/to pay Omega (+$500 and -$100, chances of being a simulation are 2 to 1).

No, it's weird (I think)

As we see, these framings usually point in the same direction. But not here! When the program receives two equal numbers as an input, advice from these approaches is:

"Universal precommitment": The optimal policy is to always take two boxes. Do it!

"Functional": You control yourself and Omega's program (because both are the same program), and it's better for you if you both take one box. So leave $1K on the table to get a million!

"Anthropic": You almost certainly are Omega's program and not the player's program. Take one simulated box to put $1M in the real one!

So here the "universal precommitment" approach is in conflict with the two other approaches. And for me personally, the advice to take both boxes (if you are a program; if you are a programmer, you definitely should write a program which will always take two boxes, no problem with that) here is much more counterintuitive than "universal precommitment" advice in all other decision theory problems I know. I think the "universal precommitment" framing is the closest to what UDT actually means, so now my confidence in it has been somewhat shaken.

I think it's weird. Do you?



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

新康姆问题 决策理论 程序 程序员 UDT
相关文章