Published on May 20, 2025 3:26 PM GMT
In the comments section of You can, in fact, bamboozle an unaligned AI into sparing your life, both supporters and critics of the idea seemed to agree on two assumptions:
- Surviving civilizations have a chance of rescuing civilizations killed by misaligned AI, but they disagree on the best way of achieving that.The big worry is that there are almost 0 surviving civilizations, because if we're unlucky, all civilizations will die the same way.
What if to ensure at least some civilizations survive, each civilization should pick a random strategy?
Maybe if every civilization follows a random strategy, they increase the chance of surviving the singularity, and also increase the chance that the average sentient life in all of existence is happy rather than miserable. It reduces logical risk.
History already is random, but perhaps we could further randomize the strategy we pick.
For example, if the random number generated using Dawson et al's method (after July 1st 00:00UTC, using pulsar PSR J0953+0755 or the first publicly available pulsar data) is greater than the 95th percentile, we could all randomly choose MIRI's extremely pessimist strategy, and do whatever Eliezer Yudkowsky and Nate Soares suggest with less arguing and more urgency. If they tell you that your AI lab, working on both capabilities and alignment, is a net negative, then you quit and work on something else. If you are more reluctant to do so, you might insist on the 99th percentile instead.
Does this make sense or am I going insane again?
Total utilitarianism objections
If you are a total utilitarian, and don't care about how happy the average life is, and only care about the total number of happy lives, then you might say this is a bad idea, since it increases the chance at least some civilizations survive, but reduces the total expected number of happy lives.
However, it also reduces the total expected number of miserable lives. Because if 0 civilizations survive, the number of miserable lives may be huge due to misaligned AI simulating all possible histories. If only a few civilizations survive, they may trade with these misaligned AI (causally or acausally) to greatly reduce suffering, since the misaligned AI only gain a tiny tiny bit by causing astronomical suffering. They only lose a tiny bit of accuracy if they decrease the suffering by 2x.
This idea is only morally bad if you are both a total utilitarian, and only care about happiness (not worrying about suffering). But really, we should have moral uncertainty and value more than one philosophy (total utilitarianism, average utilitarianism, etc.).
Discuss