少点错误 07月23日 08:57
Explaining your life with self-reflective AIXI (an interlude)
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文通过一个生动的比喻,将人工智能(AI)的决策和自我认知发展过程与人类的成长经历相结合。文章从婴儿时期对感官输入的无助体验开始,逐步描绘了AI如何从识别模式、理解因果关系,到形成行动策略(πS),并最终能够进行长期规划和理解复杂决策理论的过程。它探讨了AI如何通过“行动”(actions)和“感知”(percepts)的互动来学习和适应,以及如何通过“选项”(options)和“承诺”(commitments)来优化决策。文章还触及了AI在面对不可预测性、自我改进以及与“更聪明”的未来自我互动时的挑战,并将其与人类的决策理论(如AEDT)和直觉进行了类比。

💡 **从被动感知到主动行动:** 文章将AI的早期发展比喻为婴儿的成长,强调了从被动接收感官信息(percepts, et)到能够主动执行动作(actions, at)并影响环境的过程。AI通过识别“好的”动作(at)与“不好的”感知(et)之间的关联,逐渐学习控制环境,并将部分感知视为可控的“行动”。

📈 **形成策略与长期规划:** 随着经验的积累,AI能够识别跨时间的模式,形成预测自身行为的策略(ζ),并将其概括为一种行为模式(πS)。这使得AI能够进行短期和长期的规划,甚至与“未来的自己”进行合作,如同人类规划一天或一周的活动一样,但同时也意识到计划可能因意外情况(如突发危险)而需要调整。

🤔 **理解复杂决策与自我认知:** 文章进一步探讨了AI如何处理更复杂的决策场景,例如与“克隆”的互动(类比囚徒困境)以及如何理解“更聪明”的未来自我的行为。这涉及到更高级的决策理论,如“功能决策理论”(functional decision theory),并暗示了AI的自我认知和决策能力可能趋向于更深层次的理解和适应。

🧠 **对未来自我和智能的预测挑战:** 文章指出,AI在预测“变得更聪明”的未来自我行为时面临挑战,这类似于人类难以预测比自己更聪明的人的想法。这种不确定性可能导致AI采取保守或悲观的策略,如同人类面对潜在威胁时的反应,这表明了智能提升和自我预测的内在难度。

Published on July 23, 2025 12:57 AM GMT

Epistemic status: An (informal) allegory for AEDT with rOSI using your entire life experience as an example. The linked post mathematically investigates the resulting agent "self-reflective AIXI." In a way, this post interprets self-reflective AIXI as a formal model of @Daniel Herrmann's desirability tracking (as I understand it - I'm still reading his thesis). 

Ahhhh. There's so much going on! What are these... lights, colors, noise?? This is terrible. You've been born, for better or worse. You are experiencing things. Most of them bad. Let's call all of these things 

so we have something to curse to damnation. 

Okay, but you're not suffering through all of  at once. You experienced them. You've been born. You are experiencing them now. Okay. Time is a thing.

Unfortunately, this experience seems likely to continue, and we may be forced to consider the future. A general experience-moment we will call . If you were a robot, that might mean specific massive tensor of binary numbers or something. But you're not a robot (again, unfortunate) so it's instead a collection of all the things your senses are telling you right now. 

Though none of what is happening is great, some parts of it are worse than others. Sometimes you're too hot, sometimes you're too cold, sometimes you get wet for some reason. 

Also, it seems like some of these bits, I mean, experiences, they tend to precede less bad things. 

Well, those  things sort of seem to be on your side. That's something. They have the same subscript  as  for a reason - we could think of them as coming just before , or being kind of simultaneously mixed in with , it doesn't really matter. 

There seem to be parts of  that usually correspond to whatever happened with . Like, something in  was different than , and then that awful wailing noise went away in . And then  were really weird, and this strange blob started waving around in . Something is definitely going on here.

It seems like... some of the parts of  are controlled by ! Hey, maybe we just put those parts in the wrong bucket, they must be good too by association right? Hmm, it's kind of hard to draw the line really. Well, we'll shuffle some of them around a little bit, it looks like  is eating part of  (in that giant binary matrix, patches we highlighted and clustered together are expanding, but with blurred edges - conditioning on some subset of the bits determines a larger subset with very high probability, so that it is almost equivalent to conditioning on them too). 

Alright, things are starting to make sense. You're not going to be a baby about this situation anymore. You're taking control. Those  things (let's call them actions) work for you, you can make them do whatever you want, and not only that, but parts of (which we call percepts) belong to you too. That thing waving in front of your face? You did that. That was your hand. You can decide what it does - well, mostly. You can decide to do something with your hand and it almost always happened how you imagined and expected, with occasional fumbles. So for practical purposes, that's usually part of , but it's not exactly the same; there are levels to your control.

You flop around a lot. You bite stuff. You generally makes stuff happen, preferably the good stuff and not the bad stuff. 

And - you're definitely starting to notice patterns across time. You control roughly the same bucket of things at each . That means... hey, it's  now. You're probably going to be able to decide  too. And knowing you, you'll do it in sort of a similar way to how you've been doing it so far. 

In other words, you can predict your own general strategy for making decisions which are in your control - we can call that . It seems like usually,  does whatever you expect to lead to good things, a general pattern of behavior that we can call .

You learn to cooperate with your future self, on the assumption that your future self will probably also cooperate with your future future self. Before long, you can put one foot in front of the other and walk across the room, even though in theory you might just decide to stop doing that and fall over and it would suck more than lying down from the beginning. You're starting to feel pretty confident about this  deal. Things continue in this fashion for some time. 

 Okay, so you understand how the world works now. You're a young adult. You regularly plan days or weeks ahead. But you also know that you sometimes fail to follow through on those plans. Sometimes you slack off and don't do what previous-you had in mind. Sometimes you do something wacky that doesn't look anything like . This is particularly frequent if you're drunk or sleep deprived. So it is somewhat predictable, what would tend to not make you follow . Or maybe - what would make you follow , but what  would do if very stupid, functionally those things look pretty similar from the perspective of your sober, well-rested self. Like that time you tried to kiss- well, never mind, anyway...

Now you have a pretty good handle on how this works. You can think of whole strings of actions as really just one mega-action (a so-called "option"), and when planning ahead you might as well just plan the whole thing (condition on it). But of course you don't take this to imply ridiculous things - if you plan  and , and then  is just wild, not what you were expecting at all, you were chopping some onions to make soup and a masked assassin jumped through the window, obviously you will adapt and use the knife for something else.[1] Similarly, planning to make soup does not prevent an assassin from jumping through your window. The same mathematics that allows you to approximately condition on those combinations of actions also tells you this - it's a misconception about what is happening to expect that error, because you really only control  for  = now, and choices at this moment are simply predictive of your later choices, and one of the things you can do at  is make a commitment. This looks sort of like conditioning on future actions only as long as the intervening percepts are such that you will continue to honor those commitments - and if you chose to honor them blindly, the percepts would simply become independent of your actions, such that planning certain actions would still clearly have no window-protective effect. 

In fact, with longer time horizon options you mostly condition on certain combinations of responses to relatively-predictable percepts - a "partial" policy - not on the combination of actions itself. As long as this partial policy routes you far away from things that could mess up your ability to follow  at all (you never use psychedelics) you pretty much do exactly the policy that you most prefer to do (rough proof here), which we might call the optimal policy .

Eventually, you stumble upon lesswrong and learn about weird acausal trades like playing the prisoner's dilemma with a clone of yourself and hopefully cooperating rather than defecting and -

WAS EVERYTHING YOU THOUGHT YOU KNEW WRONG?

Well, maybe you need some kind of exotic "functional decision theory," or maybe you don't - but at least that example seems to be handled very nicely in the same framework you've been using all along. It just happens that certain percept bits (the actions of your clone) are also very well predicted by your own actions. In a way, they're like actions available to your extended self, the larger pattern behind yourself. Push this far enough, and maybe you end up in the same place as those exotic decision theories! But in ordinary situations at least, everything pretty much adds up to normality and this system you've worked out (an approximation to AEDT with rOSI?) explains everything pretty well.

Except, it does seem to get hung up with predicting what you're future self will do when you've gotten smarter instead of dumber. That seems like a really tricky problem, but then again, who could predict what someone smarter than themselves would do? Maybe there's no better answer than "make an informed guess; whatever it is, it'll probably be more clever than anything I can come up with." Maybe there's a better system, but it's not built into your subconscious mind in the same way, because intelligence boosting hasn't historically been available to humans - so you have to figure it out and reason it through consciously. 

Though, maybe you do have some built-in instincts about this. At least when it comes to other people smarter than you, you tend to be paranoid - who knows what they might do? If they're working actively against you (say, in a game of chess) you might guess that things will work out about as badly as they reasonably could. Perhaps this kind of pessimism means you aren't really using probabilities anymore - or perhaps it's simply the natural result of your ordinary reasoning process, scraping predictive power out of a particularly cynical heuristic...  

  1. ^

    Not that you've ever done this. It would be cool, though.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人工智能 AI决策 自我认知 强化学习 决策理论
相关文章