Published on August 27, 2024 11:17 PM GMT

In a 2016 blog post, Paul Christiano argued that the universal prior (hereafter "UP") may be "malign." His argument has received a lot of follow-up discussion, e.g. in

The Solomonoff prior is malign. It's not a big deal.

among other posts.

This argument never made sense to me. The reason it doesn't make sense to me is pretty simple, but I haven't seen it mentioned explicitly in any of the ensuing discussion.

This leaves me feeling like either I am misunderstanding the argument in a pretty fundamental way, or that there is a problem with the argument that has gotten little attention from the argument's critics (in which case I don't understand why).

I would like to know which of these is the case, and correct my misunderstanding if it exists, hence this post.

(Note: In 2018 I wrote a comment on the original post where I tried to state one of my objections to my argument, though I don't feel I expressed myself especially well there.)

UP-using "universes" and simulatable "universes"

The argument for malignity involves reasoning beings, instantiated in Turing machines (TMs), which try to influence the content of the UP in order to affect other beings who are making decisions using the UP.

Famously, the UP is uncomputable.

This means the TMs (and reasoning beings inside the TMs) will not be able to use^[1] the UP themselves, or simulate anyone else using the UP. At least not if we take "using the UP" in a strict and literal sense.

Thus, I am unsure how to interpret claims (which are common in presentations of the argument) about TMs "searching for universes where the UP is used" or the like.

For example, from Mark Xu's "The Solomonoff Prior is Malign":

In particular, this suggests a good strategy for consequentialists: find a universe that is using a version of the Solomonoff prior that has a very short description of the particular universe the consequentialists find themselves in.

Or, from Christiano's original post:

So the first step is getting our foot in the door—having control over the parts of the universal prior that are being used to make important decisions.
This means looking across the universes we care about, and searching for spots within those universe where someone is using the universal prior to make important decisions. In particular, we want to find places where someone is using a version of the universal prior that puts a lot of mass on the particular universe that we are living in, because those are the places where we have the most leverage.
Then the strategy is to implement a distribution over all of those spots, weighted by something like their importance to us (times the fraction of mass they give to the particular universe we are in and the particular channel we are using). That is, we pick one of those spots at random and then read off our subjective distribution over the sequence of bits that will be observed at that spot (which is likely to involve running actual simulations).

What exactly are these "universes" that are being searched over? We have two options:

this

Option 1 seems hard to square with the talk about TMs "searching for" universes or "simulating" universes. A TM can't do such things to the universes of option 1.

Hence, the argument is presumably about option 2.

That is, although we are trying to reason about the content of the UP itself, the TMs are not "searching over" or "simulating" or "reasoning about" the UP or things containing the UP. They are only doing these things to some other object, which has some (as-yet unspecified) connection to the UP, such as "approximating" the UP in some sense.

But now we face some challenges, which are never addressed in presentations of the argument:

about

can

affect

Some thoughts that one might have

What sort of thing is this not-UP -- the thing that the TMs can simulate and search over?

I don't know; I have never seen any discussion of the topic, and haven't thought about it for very long. That said, here are a few seemingly obvious points about it.

On slowdown

Suppose that we have a TM, with a whole world inside it, and some reasoning beings inside that world.

These beings are aware of some computable, but vaguely "UP-like," reasoning procedure that they think is really great.

In order to be "UP-like" in a relevant way, this procedure will have to involve running TMs, and the set of TMs that might be run needs to include the same TM that implements our beings and their world.

(This procedure needs to differ from the UP by using a computable weighting function for the the TMs. It should also be able to return results without having to wait for eternity as the non-halting TMs do their not-halting. The next section will say more about the latter condition.)

Now they want to search through computable universes (by simulation) to look for ones where the UP-esque procedure is being used.

What does it look like when they find one? At this point, we have

the "outer" TM,

universe

one special part that is simulating

a second universe

one special part that implements the UP-like procedure

aren't

the

outer TM

Each level of nesting incurs some slowdown relative to just running the "relevant" part of the thing that is being nested, because some irrelevant stuff has to come along for the ride.

It takes many many clock-ticks of the outer TM to advance the copy of it several levels down, because we have to spend a lot of time on irrelevant galaxies and on other TMs involved in the procedure.

(There is also a extra "constant factor" from the fact that we have to wait for the outer TM to evolve life, etc., before we get to the point where it starts containing a copy at all.)

So I don't see how the guys in the outer TM would be able to advance their simulation up to the point where something they can control is being "read off," without finding that in fact this read-off event occurred in their own distant past, and hence is no longer under their control.

To riff on this: the malignity argument involves the fact that the UP puts high weight on simple TMs, but doesn't care about speed, so it may put high weight on TMs that do very long-running things like simulating universes that simulate other universes.

Fine -- but once we start talking about a universe that is simulating itself (in order to reason about UP-like objects that involve it), speed starts to matter for a different reason. If you are simulating yourself, it is always with some slowdown, since you contain parts other than the simulation. You'll never be able to "catch up with yourself" and, e.g., read your own next action off of the simulation rather than choosing it in the ordinary manner.

It's possible that there are ways around this objection, even if it's valid in principle. For instance, maybe the reasoning beings can make inferences about the future behavior of the UP users, jumping ahead of the slow simulation.

It's easy to imagine how this might work for "finding the output channel," since you can just guess that a channel used once will be re-used again. But it would be much harder to decide what one's preferred output actually is at "future" points not yet reached in the simulation; here one would effectively need to do futurism about the world in which the procedure is being used, probably on an extremely long time horizon.

On efficiency

There are results showing that the UP (or Solomonoff Induction) are in some sense optimal. So it is easy to wind up thinking that, if some procedure is a good idea, it must be (in some sense) an "approximation of" these things.

But the kind of "approximation" involved does not look (in hand-wavey terms) like the ideal thing (UP or SI), plus some unbiased "approximation noise."

The ways that one would deviate from the ideal, when making a practically useful procedure, have certain properties that the ideal itself lacks. In the hand-wavey statistical analogy, the "noise" is not zero-mean.

I noted above that the "UP-like procedure" will need to use a computible weighting function. So, this function can't be Kolmogorov complexity.

And indeed, if one is designing a procedure for practical use, one probably wouldn't want anything like Kolmogorov complexity. All else being equal, one doesn't want to sit around for ages waiting for a TM to simulate a whole universe, even if that TM is "simple." One probably wants to prioritize TMs that can yield answers more quickly.

As noted above, in practice one never has an infinite amount of time to sit around waiting for TMs to (not) halt, so any method that returns results in finite time will have to involve some kind of effective penalty on long-running TMs.

But one may wish to be even more aggressive about speed than simply saying "I'm only willing to wait this long, ignore any TM that doesn't halt before then." One might want one's prior to actively prefer fast TMs over slow ones, even within the range of TMs fast enough that you're willing to wait for them. That way, if at any point you need to truncate the distribution and only look at the really high-mass TMs, the TMs you are spared from running due to the truncation are preferentially selected to be ones you don't want to run (because they're slow).

These points are not original, of course. Everyone talks about the speed prior.

But now, return to our reasoning beings in a TM, simulating a universe, which in turn uses a procedure that's great for practical purposes.

The fact that the procedure is "great for practical purposes" is crucial to the beings' motivation, here; they expect the procedure to actually get used in practice, in the world they're simulating. They expect this because they think it actually is a great idea -- for practical purposes -- and they expect the inner creatures of the simulation to notice this too.

Since the procedure is great for practical purposes, we should expect that it prioritizes efficiently computable TMs, like the speed prior does.

But this means that TMs like the "outer TM" in which our beings live -- which are simple (hence UP cares about them) but slow, having to simulate whole universes with irrelevant galaxies and all before they can get to the point -- are not what the "great for practical purposes" procedure cares about.

Once again: the malignity argument involves the fact that the UP puts high weight on simple TMs, but doesn't care about speed. This is true of the UP. But it is a count against using the UP, or anything like it, for practical purposes.

And so we should not expect the UP, or anything like it, to get used in practice by the kinds of entities we can simulate and reason about.

We (i.e. "reasoning beings in computable universes") can influence the UP, but we can't reason about it well enough to use that influence. Meanwhile, we can reason about things that are more like the speed prior -- but we can't influence them.

The common thread

It feels like there is a more general idea linking the two considerations above.

It's closely related to the idea I presented in When does rationality-as-search have nontrivial implications?.

Suppose that there is some search process that is looking through a collection of things, and you are an element of the collection. Then, in general, it's difficult to imagine how you (just you) can reason about the whole search in such a way as to "steer it around" in your preferred direction.

If you are powerful enough to reason about the search (and do this well enough for steering), then in some sense the search is unnecessary -- one could delete all the other elements of the search space, and just consult you about what the search might have done.

As stated this seems not quite right, since you might have some approximate knowledge of the search that suffices for your control purposes, yet is "less powerful" than the search as a whole.

For anything like the malignity argument to work, we need this kind of "gap" to exist -- a gap between the power needed to actually use the UP (or the speed prior, or whatever), and the power needed to merely "understand them well enough for control purposes."

Maybe such a gap is possible! It would be very interesting if so.

But this question -- which seems like the question on which the whole thing turns -- is not addressed in any of the treatments I've seen of the malignity argument. Instead, these treatments speak casually of TMs "simulating universes" in which someone is "using" the UP, without addressing where in the picture we are to put the "slack" -- the use of merely-approximate reasoning -- that is necessary for the picture to describe something possible at all.

What am I missing?

^{^}
For simplicity, I mostly avoid mentioning Solomonoff Induction in this post, and refer more broadly to "uses" of the UP, whatever these may be.

Discuss

UP-using "universes" and simulatable "universes"

Some thoughts that one might have

On slowdown

On efficiency

The common thread

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签