少点错误 2024年08月17日
Rationalists are missing a core piece for agent-like structure (energy vs information overload)
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了代理结构问题及相关模型,包括世界的演化规律、代理的特征、决策方式等,还提到了计算主义世界观下的信息过载问题及能量导向可能的解决方案。

🎯代理结构问题中,世界遵循某种演化规律,如xn+1=f(xn),x具有几何结构,代理通过马尔可夫毯的信息输入输出通道观察和作用于世界,并用类似贝叶斯模型处理输入,用类似argmax做出决策。

🚫计算主义世界观下,理想先验模型如巨大的深度神经网络,虽试图捕捉信息,但存在信息过载问题,会捕获大量不重要信息使模型笨重,还会扭曲重要信息。

💡能量导向可能解决信息过载,以物理立方体建模为例,应使表示最大化与系统能量的相关性,但让神经网络建模能量存在困难,且计算主义方法正式化此问题需复杂推导。

Published on August 17, 2024 9:57 AM GMT

The agent-like structure problem is a question about how agents in the world are structured. I think rationalists generally have an intuition that the answer looks something like the following:

There is a fairly-obvious gap in the above story, in that it lacks any notion of energy (or entropy, temperature, etc.). I think rationalists mostly feel comfortable with that because:

I've come to think of this as "the computationalist worldview" because functional input/output relationships are the thing that is described very well with computations, whereas laws like conservation of energy are extremely arbitrary from a computationalist point of view. (This should be obvious if you've ever tried writing a simulation of physics, as naive implementations often lead to energy exploding.)

Radical computationalism is killed by information overload

Under the most radical forms of computationalism, the "ideal" prior is something that can range over all conceivable computations. The traditional answer to this is Solomonoff induction, but it is not computationally tractable because it has to process all observed information in every conceivable way.

Recently with the success of deep learning and the bitter lesson and the Bayesian interpretations of deep double descent and all that, I think computationalists have switched to viewing the ideal prior as something like a huge deep neural network, which learns representations of the world and functional relationships which can be used by some sort of decision-making process.

Briefly, the issue with these sorts of models is that they work by trying to capture all the information that is reasonably non-independent of other information (for instance, the information in a picture that is relevant for predicting information in future pictures). From a computationalist point of view, that may seem reasonable since this is the information that the functional relationships are about, but outside of computationalism we end up facing two problems:

To some extent, human-provided priors (e.g. labels) can reduce these problems, but that doesn't seem scalable, and really humans also sometimes struggle with these problems too. Plus, philosophically, this would kind of abandon radical computationalism.

"Energy"-orientation solves information overload

I'm not sure to what extent we merely need to focus on literal energy versus also on various metaphorical kinds of energy like "vitality", but let me set up an example of a case where we can just consider literal energy:

Suppose you have a bunch of physical cubes whose dynamics you want to model. Realistically, you just want the rigid-body dynamics of the cubes. But if your models are supposed to capture information, then they have to model all sorts of weird stuff like scratches to the cubes, complicated lighting scenarios, etc.. Arguably, more of the information about (videos of) the cubes may be in these things than in the rigid-body dynamics (which can be described using only a handful of numbers).

The standard approach is to say that the rigid-body dynamics constitute a low-dimensional component that accounts for the biggest chunk of the dynamics. But anecdotally this seems very fiddly and basically self-contradictory (you're trying to simultaneously maximize and minimize information, admittedly in different parts of the model, but still). The real problem is that scratches and lighting and so on are "small" in absolute physical terms, even if they carry a lot of information. E.g. the mass displaced in a scratch is orders of magnitude smaller than the mass of a cube, and the energy in weird light phenomena is smaller than the energy of the cubes (at least if we count mass-energy).

So probably we want representation that maximizes the correlation with the energy of the system, at least moreso than we want a representation that maximizes the mutual information with observations of the system.

... kinda

The issue is that we can't just tell a neural network to model the energy in a bunch of pictures, because it doesn't have access to the ground truth. Maybe by using the correct loss function, we could fix it, but I'm not sure about that, and at the very least it is unproven so far.

I think another possibility is that there's something fundamentally wrong with this framing:

An agent is characterized by a Markov blanket in the world that has informational input/output channels for the agent to get information to observe the world and send out information to act on it.

As humans, we have a natural concept of e.g. force and energy because we can use our muscles to apply a force, and we take in energy through food. That is, our input/output channels are not simply about information, and instead they also cover energetic dynamics.

This can, technically speaking, be modelled with the computationalist approach. You can say the agent has uncertainty over the size of the effects of its actions, and as it learns to model these effect sizes, it gets information about energy. But actually formalizing this would require quite complex derivations with a recursive structure based on the value of information, so it's unclear what would happen, and the computationalist approach really isn't mathematically oriented towards making it easy.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

代理结构 计算主义 信息过载 能量导向
相关文章