Pragmatic decision theory, causal one-boxing, and how to literally save the world

Published on July 28, 2025 2:20 AM GMT

The people who are crazy enough to think they can change the world are the ones who do.

~Steve Jobs

The reasonable man adapts himself to the world: the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man.

~George Bernard Shaw

You could call it heroic responsibility, maybe," Harry Potter said. "Not like the usual sort. It means that whatever happens, no matter what, it's always your fault. Even if you tell Professor McGonagall, she's not responsible for what happens, you are. Following the school rules isn't an excuse, someone else being in charge isn't an excuse, even trying your best isn't an excuse. There just aren't any excuses, you've got to get the job done no matter what.

~Eliezer Yudkowsky

I just wanted to write a quick post about the decision theory which allows me to receive $1,000,000 in Newcomb’s Problem, while still being more or less causal, and some implications of this for saving the world. You probably need a decent background on Newcomb’s Problem and decision theory to understand that part of this post, as I’m not going into much detail.

Update: I was told this maybe sounds like functional decision theory, looking at the definition this could be true, but my emphasis may be different and I haven't formalized it enough or looked into FDT enough to be sure.

Pragmatic Decision Theory & Applications

I have had this decision theory for as long as I can remember, and my initial intuition was that it’s extremely obvious that you should one-box.

I’ve thought about it a bit and I guess I can see both sides, but still one-box-ing seems obvious to me, here’s why:

I tend to use a form of realistic “philosophical pragmatism” as my epistemic system. Philosophical pragmatism roughly says that you should choose whatever beliefs have the best outcomes. In my particular version, I still try to evaluate all evidence as thoroughly as possible and update on it in a Bayesian way. Then I just add an additional step where I evaluate what the most effective belief is for my goals and choose to act on whatever belief will lead to the best outcome.

In Newcomb’s Problem, my intuition is roughly, “well, if I’m the kind of person who 1-boxes, then I will get $1,000,000 with high probability, whereas if I’m the kind of person who 2-boxes, I will almost definitely get $1,000.” So in that moment I decide to be the kind of person who 1-boxes; but the predictor, in order to be extremely reliable, probably already knew that this was my decision theory. There’s no way to trick the predictor by changing my decision theory at the last moment, because if I was the kind of person who did that kind of thing, they probably would have predicted it.

While there is always a chance my reasoning is flawed or that this is the one time the predictor makes a mistake, I am told the predictor is extremely reliable, I presume this means over 99% reliable, and without knowing anything else about the situation I must assume this extremely reliable prediction is also true about me. So even if there’s a 1% chance I’m wrong, by changing my decision I would lose at least 99% x $1,000,000 in exchange for a $1000 gain. Even after accounting for diminishing returns on money, I think the 99% chance of $1,000,000 is much better than a 100% chance of $1000 +1% chance of $1,000,000.

The best expected outcome results from choosing to one-box, therefore, I choose to one-box, because using this flexible decision theory causes me to always get the best results in any given situation; I suppose you could also call this meta-causal decision theory, on any given question I choose whichever decision theory causes the best results^[1].

This helps with situations like the simulation hypothesis, where even if you think there’s a very good chance you’re in a simulation, as an altruist you can just choose to act as though you are in basement reality, as long as the altruistic expected value is higher than your other options^[2].

It’s also great for Pacal’s mugging. As Pascal’s enticement/threats approaches infinity, you just have to calculate what the chances are that you could achieve infinite value by taking some other approach, like investing the money he’s trying to take from you into longtermism instead, and hoping that we will eventually figure out a way to create some kind of perpetual motion machine with unknown physics that achieves infinite value. If the expected value of that reasonably seems higher, then that would be a better use of the money.

In all cases, you just have to figure out what your goal is, and calculate the expected value of each approach you could take, sometimes coming up with clever or weird approaches that help you resolve the weirdness of the problem. You have to be proactive in choosing your values and coming up with novel solutions, while at the same time taking seriously and calculating the effects of things like unreasonably reliable predictors, anthropic reasoning about simulations, and infinite value.

Saving The World

I think this can also be really important when deciding whether or not you should try to take extremely ambitious, extremely high expected value action before knowing how you will possibly succeed, even in the face of extremely high priors that success is unlikely;

By accepting the prior as fate without deeper analysis, you thereby cause yourself to fall into the overwhelming category of people who did not succeed at extremely ambitious, extremely high expected value projects, because you are much less likely to really try to succeed, or even try at all.

By objectively analyzing your situation with crystalline clarity, and then seeing how likely you would be to succeed if you believed with absolute certainty that you could find a way to succeed, and thereby having virtually infinite growth mindset/self-efficacy/belief that you will find a way even if you have to try a million different ways, you take into account the fact that by acting on this belief you will thereby cause yourself to be the kind of person who is much more likely to succeed at such projects. As long as the all- things-considered expected value is your best option, you take that course of action.

When you decide to act as 'the kind of person who succeeds at absurdly difficult projects,' such as literally saving the world, you don't ignore the absurd difficulty - you engage with it differently. You simultaneously maintain full awareness of the countless ways to fail while acting from the assumption that solutions exist for each obstacle. This dual mindset, radical agency paired with radical realism, changes how you approach problems. You search harder for solutions because you are choosing to act on the belief that they exist.

You still have to deeply examine the situation and do the kind of things that will make you that kind of person, i.e. figure out all of the characteristics that you would need to have as a person and that your environment would need to have, and if you don’t have these characteristics, then find ways of effectively increasing those characteristics, and putting yourself in the right type of environment. You also need to accept the overwhelming chance that you will fail, and nonetheless act as though you had 100% full control over your success, constantly looking for ways you could fail, and constantly course-correcting to put yourself on a realistic track for success.

If you have a more simplistic epistemic/decision theory where you take things at face value, without considering how your own worldview affects the outcome, then you won’t be likely to seek out the extremely improbable path you would need to take in order to become the kind of person who literally saves the world.

So yes, you need to think extremely carefully about the situation, and understand everything extremely well, but this also includes all of the sets of beliefs that you could have, and the ways in which acting on those sets of beliefs will affect the outcome you hope to achieve.

Is not enough to know that making yourself personally responsible for saving the world is “crazy,” or “unreasonable”;

The stance required to take heroic, ultimate responsibility, in a pragmatic way, comes from knowing that it’s crazy and unreasonable to expect this from yourself; knowing that you will need to invent or discover unusual actions and innovative solutions every step of the way that allow you to become that person; knowing that you will need to have full, realistic knowledge of all the ways that this path seems impossible, while still believing it is possible; and, taking account of all of this with full-eyed clarity, take responsibility for saving the world anyways.

Note: this is not an endorsement or exhortation that someone not called to such a path should take such a path. Not everyone is willing to get their clothes wet to save a drowning child, that’s totally fine. And, moreover, not everyone can swim, and the expected value of trying to save a drowning child if you can’t swim is actually quite low, probably negative, therefore not recommended on pragmatic decision theory.

^{^}
Not strictly of course, I suppose you could design a thought experiment which defeats this by putting me in a situation where if I use pragmatic decision theory I automatically lose, then I wouldn’t be able to escape this by any decision I make based on pragmatic decision theory.
^{^}
To be clear, I think it still requires extremely powerful reasoning about very complicated anthropic factors to get the right answer on expected value, but taking this pragmatic approach makes getting an 80/20 answer that is good enough for practical purposes much easier, and helps you make the right decision even if you are almost definitely in a simulation.

Discuss

Pragmatic Decision Theory & Applications

Saving The World

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签