Explaining your life with self-reflective AIXI (an interlude)

Published on July 23, 2025 12:57 AM GMT

Epistemic status: An (informal) allegory for AEDT with rOSI using your entire life experience as an example. The linked post mathematically investigates the resulting agent "self-reflective AIXI." In a way, this post interprets self-reflective AIXI as a formal model of @Daniel Herrmann's desirability tracking (as I understand it - I'm still reading his thesis).

Ahhhh. There's so much going on! What are these... lights, colors, noise?? This is terrible. You've been born, for better or worse. You are experiencing things. Most of them bad. Let's call all of these things

$ω$

so we have something to curse to damnation.

Okay, but you're not suffering through all of $ω$ at once. You experienced them. You've been born. You are experiencing them now. Okay. Time is a thing.

$æ_{1}, æ_{2}, æ_{3}, . . .$

Unfortunately, this experience seems likely to continue, and we may be forced to consider the future. A general experience-moment we will call $æ_{t}$ . If you were a robot, that might mean specific massive tensor of binary numbers or something. But you're not a robot (again, unfortunate) so it's instead a collection of all the things your senses are telling you right now.

Though none of what is happening is great, some parts of it are worse than others. Sometimes you're too hot, sometimes you're too cold, sometimes you get wet for some reason.

Also, it seems like some of these bits, I mean, experiences, they tend to precede less bad things.

$a_{1}, e_{1}, a_{2}, e_{2}, a_{3}, e_{3}, . . .$

Well, those $a_{t}$ things sort of seem to be on your side. That's something. They have the same subscript $t$ as $e_{t}$ for a reason - we could think of them as coming just before $e_{t}$ , or being kind of simultaneously mixed in with $e_{t}$ , it doesn't really matter.

There seem to be parts of $e_{t}$ that usually correspond to whatever happened with $a_{t}$ . Like, something in $a_{36}$ was different than $a_{35}$ , and then that awful wailing noise went away in $e_{36}$ . And then $a_{50 : 64}$ were really weird, and this strange blob started waving around in $e_{55 : 63}$ . Something is definitely going on here.

It seems like... some of the parts of $e_{t}$ are controlled by $a_{t}$ ! Hey, maybe we just put those parts in the wrong bucket, they must be good too by association right? Hmm, it's kind of hard to draw the line really. Well, we'll shuffle some of them around a little bit, it looks like $a_{t}$ is eating part of $e_{t}$ (in that giant binary matrix, patches we highlighted and clustered together are expanding, but with blurred edges - conditioning on some subset of the bits determines a larger subset with very high probability, so that it is almost equivalent to conditioning on them too).

$. . . a_{1500}, e_{1500}, a_{1501}, e_{1501}, a_{1502}, e_{1502}, . . .$

Alright, things are starting to make sense. You're not going to be a baby about this situation anymore. You're taking control. Those $a_{t}$ things (let's call them actions) work for you, you can make them do whatever you want, and not only that, but parts of $e_{t}$ (which we call percepts) belong to you too. That thing waving in front of your face? You did that. That was your hand. You can decide what it does - well, mostly. You can decide to do something with your hand and it almost always happened how you imagined and expected, with occasional fumbles. So for practical purposes, that's usually part of $a_{t}$ , but it's not exactly the same; there are levels to your control.

You flop around a lot. You bite stuff. You generally makes stuff happen, preferably the good stuff and not the bad stuff.

And - you're definitely starting to notice patterns across time. You control roughly the same bucket of things at each $t$ . That means... hey, it's $a_{1550}$ now. You're probably going to be able to decide $a_{1551}$ too. And knowing you, you'll do it in sort of a similar way to how you've been doing it so far.

In other words, you can predict your own general strategy for making decisions which are in your control - we can call that $ζ$ . It seems like usually, $ζ$ does whatever you expect to lead to good things, a general pattern of behavior that we can call $π_{S}$ .

You learn to cooperate with your future self, on the assumption that your future self will probably also cooperate with your future future self. Before long, you can put one foot in front of the other and walk across the room, even though in theory you might just decide to stop doing that and fall over and it would suck more than lying down from the beginning. You're starting to feel pretty confident about this $π_{S}$ deal. Things continue in this fashion for some time.

$. . . a_{18500}, e_{18500}, a_{18501}, e_{18501}, a_{18502}, e_{18502}, . . .$

Okay, so you understand how the world works now. You're a young adult. You regularly plan days or weeks ahead. But you also know that you sometimes fail to follow through on those plans. Sometimes you slack off and don't do what previous-you had in mind. Sometimes you do something wacky that doesn't look anything like $π_{S}$ . This is particularly frequent if you're drunk or sleep deprived. So it is somewhat predictable, what would tend to not make you follow $π_{S}$ . Or maybe - what would make you follow $π_{S}$ , but what $π_{S}$ would do if very stupid, functionally those things look pretty similar from the perspective of your sober, well-rested self. Like that time you tried to kiss- well, never mind, anyway...

$. . . a_{21500}, e_{21500}, a_{21501}, e_{21501}, a_{21502}, e_{21502}, . . .$

Now you have a pretty good handle on how this works. You can think of whole strings of actions as really just one mega-action (a so-called "option"), and when planning ahead you might as well just plan the whole thing (condition on it). But of course you don't take this to imply ridiculous things - if you plan $a_{21503}, a_{21504},$ and $a_{21505}$ , and then $e_{21503}$ is just wild, not what you were expecting at all, you were chopping some onions to make soup and a masked assassin jumped through the window, obviously you will adapt and use the knife for something else.^[1] Similarly, planning to make soup does not prevent an assassin from jumping through your window. The same mathematics that allows you to approximately condition on those combinations of actions also tells you this - it's a misconception about what is happening to expect that error, because you really only control $a_{t}$ for $t$ = now, and choices at this moment are simply predictive of your later choices, and one of the things you can do at $a_{t}$ is make a commitment. This looks sort of like conditioning on future actions only as long as the intervening percepts are such that you will continue to honor those commitments - and if you chose to honor them blindly, the percepts would simply become independent of your actions, such that planning certain actions would still clearly have no window-protective effect.

In fact, with longer time horizon options you mostly condition on certain combinations of responses to relatively-predictable percepts - a "partial" policy - not on the combination of actions itself. As long as this partial policy routes you far away from things that could mess up your ability to follow $π_{S}$ at all (you never use psychedelics) you pretty much do exactly the policy that you most prefer to do (rough proof here), which we might call the optimal policy $π^{*}$ .

Eventually, you stumble upon lesswrong and learn about weird acausal trades like playing the prisoner's dilemma with a clone of yourself and hopefully cooperating rather than defecting and -

WAS EVERYTHING YOU THOUGHT YOU KNEW WRONG?

Well, maybe you need some kind of exotic "functional decision theory," or maybe you don't - but at least that example seems to be handled very nicely in the same framework you've been using all along. It just happens that certain percept bits (the actions of your clone) are also very well predicted by your own actions. In a way, they're like actions available to your extended self, the larger pattern behind yourself. Push this far enough, and maybe you end up in the same place as those exotic decision theories! But in ordinary situations at least, everything pretty much adds up to normality and this system you've worked out (an approximation to AEDT with rOSI?) explains everything pretty well.

Except, it does seem to get hung up with predicting what you're future self will do when you've gotten smarter instead of dumber. That seems like a really tricky problem, but then again, who could predict what someone smarter than themselves would do? Maybe there's no better answer than "make an informed guess; whatever it is, it'll probably be more clever than anything I can come up with." Maybe there's a better system, but it's not built into your subconscious mind in the same way, because intelligence boosting hasn't historically been available to humans - so you have to figure it out and reason it through consciously.

Though, maybe you do have some built-in instincts about this. At least when it comes to other people smarter than you, you tend to be paranoid - who knows what they might do? If they're working actively against you (say, in a game of chess) you might guess that things will work out about as badly as they reasonably could. Perhaps this kind of pessimism means you aren't really using probabilities anymore - or perhaps it's simply the natural result of your ordinary reasoning process, scraping predictive power out of a particularly cynical heuristic...

^{^}
Not that you've ever done this. It would be cool, though.

Discuss

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签