少点错误 05月18日 21:42
Modeling versus Implementation
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了智能体基础研究中建模与实现两种不同的研究思路。作者认为,建模是为了构建超智能体的抽象模型,并基于此进行安全性论证,例如AIXI模型。而另一派研究者,如MIRI,则倾向于构建可实际执行的玻璃盒智能体,他们的目标是将理论转化为代码。作者认为这两种思路各有优劣,建模可能难以应对超智能体的优化压力,但可以预测对齐方案为何失败;而实现则面临着如何超越深度学习的挑战。作者强调,在评估智能理论时,应明确区分建模和实现的不同关注点。

💡智能体基础研究存在建模与实现两种思路。建模旨在构建超智能体的抽象模型,用以论证安全性,例如AIXI模型。

💻另一派研究者,如MIRI,致力于构建可实际执行的玻璃盒智能体,希望将理论转化为代码,最终实现落地。

🤔建模可能难以应对超智能体的优化压力,但擅长预测对齐方案失败的原因。要论证对齐方案成功,模型需具备对智能大幅提升的鲁棒性。

🚀实现则面临着如何超越深度学习的挑战。作者认为,发明能引领深度学习的新范式非常困难,但值得探索。

Published on May 18, 2025 1:38 PM GMT

Epistemic status: I feel that naming this axis deconfuses me about agent foundations about as much as writing the rest of this sequence so far - so it is worth a post even though I have less to say about it. 

I think my goal in studying agent foundations is a little atypical. I am usually trying to build an abstract model of superintelligent agents and make safety claims based on that model.

For instance, AIXI models a very intelligent agent pursuing a reward signal, and allows us to conclude that it probably seizes control of the reward mechanism by default. This is nice because it makes our assumptions fairly explicit. AIXI has epistemic uncertainty but no computational bounds, which seems like a roughly appropriate model for agents much smarter than anything they need to interact with. AIXI is explicitly planning to maximize its discounted reward sum, which is different from standard RL (which trains on a reward signal, but later executes learned behaviors). We can see these things from the math.

Reflective oracles are compelling to me because they seem like an appropriate model for agents at a similar level of intelligence mutually reasoning about each other, possibly including a single agent over time (in the absence of radical intelligence upgrades?). 

I'm willing to use these models where I expect them to bare weight, even if they are not "the true theory of agency." In fact (as is probably becoming clear over the course of this sequence) I am not sure that a true theory of agency applicable to all contexts exists. The problem is that agents have a nasty habit of figuring stuff out, and anything they figure out is (at least potentially) pulled into agent theory. Agent theory does not want to stay inside a little bubble in conceptual space; it wants to devour conceptual space. 

I notice a different attitude among many agent foundations researchers. As I understand it, MIRI intended to build principled glass-box agents based on Bayesian decision theory. Probably as a result, it seems that MIRI-adjacent researchers tend to explicitly plan on actually implementing their theory; they want it to be executable. Someday. After a lot of math has been done. This isn't to say that they currently write a lot of code - I am only discussing their theory of impact as I understand it. To be clear, this is not a criticism; it is fine for some people to focus on theory building with an eye towards implementation and others to focus on performing implementation.

For example, I believe @abramdemski really wants to implement a version of UDT and @Vanessa Kosoy really wants to implement an IBP agent. They are both working on a normative theory which they recognize is currently slightly idealized or incomplete, but I believe that their plan routes through developing that theory to the point that it can be translated into code. Another example is the program synthesis community in computational cognitive science (e.g. Josh Tenenbaum, Zenna Tavares). They are writing functional programs to compete with deep learning right now.  

For a criticism of this mindset, see my (previous in this sequence) discussion of why glass-box learners are not necessarily safer. Also, (relatedly) I suspect it will be rather hard to invent a nice paradigm that takes the lead from deep learning. However, I am glad people are working on it and I hope they succeed; and I don't mean that in an empty way. I dabble in this quest myself - I even have a computational cognitive science paper. 

I think that my post on what makes a theory of intelligence useful suffers from a failure to make explicit this dichotomy between modeling and implementation. I mostly had the modeling perspective in mind, but sometimes made claims about implementation. These are inherently different concerns.

The modeling perspective has its own problems. It is possible that agent theory is particularly unfriendly to abstract models - superintelligences apply a lot of optimization pressure, and pointing that optimization pressure in almost the right direction is not good enough. However, I am at least pretty comfortable using abstract models to predict why alignment plans won't work. To conclude that an alignment plan will work, you need to know that your abstract model is robust to vast increases in intelligence. That is why I like models similar to AIXI, which have already "taken the limit" of increasing intelligence - even if they (explicitly) leave out the initial conditions of intelligence-escalation trajectories. 



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

智能体基础 建模 实现 AIXI MIRI
相关文章