少点错误 前天 14:00
Agent foundations: not really math, not really science
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文深入探讨了代理基础研究(agent foundations)与传统科学和数学的区别。作者认为,代理基础研究旨在理解尚未发生的现象(如强大的智能体),其方法论高度依赖数学和哲学,而非实验。与科学通过观察现象寻找规律不同,代理基础研究已知潜在规律(计算数学),但需探索可能出现的现象。研究者通过阅读数学书籍、深度思考和与同行交流来推进工作,而非进行实验。这种方法类似于计算机科学,关注的是抽象概念的数学表达,而非物理实现。作者强调,尽管研究方法独特且具挑战性,但这是确保未来安全的关键。文章还类比了宇宙学家和核物理学家在无实验条件下的理论构建,呼吁更多研究者加入代理基础领域,共同探索智能体本质,以应对未来挑战。

🎯 **代理基础研究的独特性质**:与传统科学(通过实验观察现象并寻找规律)和数学(从公理推导定理)不同,代理基础研究旨在理解一个尚未发生但极有可能发生的现象(如强大的智能体)。其研究方法主要依赖数学和哲学,而非直接的现实实验,因为研究对象是“基底无关”的。

🧠 **研究方法的挑战与实践**:研究者通过大量阅读数学书籍、深度思考和与同行交流来推进研究,而非进行实验。这种方法被作者视为更困难但必要的研究类型。作者认为,研究所需信息已蕴含在现有知识中,类似数学猜想的真值隐藏在逻辑和定义中,而非需要新的数据。

💻 **与计算机科学的类比**:代理基础研究的模式与“计算机科学”有相似之处。计算机科学关注的是计算的数学原理,一旦定义清晰,即可在理论上取得进展,而不必依赖具体的物理实现(如晶体管)。同样,代理基础研究也侧重于抽象概念的数学形式化。

🚀 **对未来AI安全的重要性**:作者相信,通过数学方法理解“非涌现的强大优化器”和“可修正代理”等概念,是导航未来AI安全的关键。尽管无法通过实验验证,但通过发展数学工具并证明相关定理,可以为构建安全的AI系统铺平道路。

💡 **研究的必要性与呼吁**:作者认为,研究代理的本质并非需要重复的实验,而是需要对现有信息的深刻理解和数学上的精确表达。文章类比了宇宙学家和核物理学家在缺乏可干预实验的情况下构建理论的成就,并呼吁更多研究者投身代理基础研究,以集体智慧应对未来挑战。

Published on August 17, 2025 5:48 AM GMT

These ideas are not well-communicated, and I'm hoping readers can help me understand them better in the comments.

The classical model of the scientific process is that its purpose is to find a theory that explains an observed phenomenon. Once you have any model whose outputs matches your observations, you have a valid candidate theory. Occam's razor says it should be simple. And if your theory can make correct predictions about observations that hadn't previously been made, then the theory is validated.

The classical model of mathematics is that you start with axioms and inference rules, and you derive theorems. There is no requirement that the axioms or theorems need to reflect something in reality to be considered mathematically valid (although they almost always do). Mathematicians have intuitions about what theorems are true before they prove them, and they have opinions about what theorems are important or meaningful, based partly on aesthetics.

What I[1] am trying to do with agent foundations is not really either of these, and I think this is one reason why many people don't "get" agent foundations. We're trying to understand a phenomenon in the real world (agents), but our methods are almost exclusively mathematical (or arguably philosophical). The nature of the phenomenon is substrate-independent, and so we don't need to interact directly with "reality" to do our work. But we're also not totally sure which substrate independent thing it is, so we're still working out what mathematical objects are the right ones to be working with.

I do think this makes it a harder type of research. I just also think it's the type of research we have to do to get a good future.

Empirics

This mismatch becomes especially salient when considering its relationship to empiricism. People sometimes ask (understandably!) agent foundations researchers what experiments they plan to do. And sometimes people imply that because the field is not doing experiments, it is probably detached from reality and not useful. I have found these interactions awkward and unsatisfying for both parties, I think because we don't have a shared concept for me to refer to, somewhere between science and math.

From where I'm standing, it's hard to even think of how experiments would be relevant to what I'm doing. It feels like someone asking me why I haven't soldered up a prototype. That's just... not the kind of thing agent foundations is. I can imagine experiments that might sound like they're related to agent foundations, but they would just be checking a box on a bureaucratic form, and not actually generated by me trying to solve the problem.

I spend my time reading math books, pacing around thinking really hard, and trying to formulate and prove theorems. I am regularly accessing my beliefs about how the ideas can eventually be applied to reality, to guide what math I'm thinking about, but at no point have I thought to myself "what I need now is to run an experiment". The closest thing I do is when I search for whether people have already written papers about the ideas I'm developing, or sanity-checking my thoughts by talking to other researchers.

What makes agent foundations different?

One thing that makes agent foundations different from science is that we're trying to understand a phenomenon that hasn't occurred yet (but which we have extremely good reasons for believing will occur). I can't do experiments on powerful agents, because they don't exist. And of course, the whole point here is that they're fatally dangerous by default, so bringing them into existence would not be worth the information gotten from such an "experiment". I also cannot usefully do experiments on exiting AI models, because they're not displaying the phenomenon that I'm trying to understand.[2]

With normal science, there's a phenomenon that we observe, and what we want is to figure out the underlying laws. With AI systems, it's more accurate to say that we know the underlying laws (such as the mathematics of computation, and the "initial conditions" of learning algorithms) and we're trying to figure out what phenomena will occur (e.g. what fraction of them will undergo instrumental convergence).

So, I don't think that what we're lacking is data or information about the nature of agents -- we're lacking understanding of the information we already have. The reason I'm not thinking about experiments is because I don't feel any pull toward gaining more information of that type. I'm not confused in a way where looking at something in the territory will resolve my confusion. I believe the answers to my research questions are already contained within what we know, in the same way that the truth-value of conjectures is already contained within the logic, axioms, and definitions.

If we were trying to figure out chemistry and material science, then we absolutely would need tons of information, because our everyday observations are simply insufficient information to pin down the true theory of matter. There are tons of ways that the underlying laws of physics of stuff could be, and you can't simply figure it out by thinking about it.

But I don't think that's true for agents. I'm not saying that I think I could have been born in an armchair and then do nothing but think until one day I eventually understand agents. But I am saying that the decades of my life that I've already lived, combined with intensive interactions with other researchers, are sufficient real-world information for me to have about agents.

It's kinda like "computer science"

For some reason, the field that studies the mathematics of computation ended up being called computer science. This might be non-coincidentally related to what I'm trying to express about agent foundations. Computation is substrate-independent, so after we figured out the definition of computation which usefully captured the phenomenon we wanted to engineer, we no longer had to check with reality about it to make progress on important questions.

I don't think that Archimedes could have figured out basically any results of computability theory. This is despite the fact that, in "theory", one could figure that all out by thinking. (He even had humans as examples of general-purpose computers.) But that's not really sufficient. One needs to have some kind of life experiences that points your mind toward the relevant concepts, like "computing machines" at all. But once someone has those, they don't necessarily need more information from experiments to figure out a bunch of computability theory. I think if people in Charles Babbage's era had decided that we needed to grok the nature of computation-in-general in order to save the world, then they could have done it, and done so without figuring out transistors or magnetic memory or whatever. It's noteworthy that humanity did indeed deliberately invent the first Turing-complete programming languages before building Turing-complete computers, and we have also figured out a lot of the theory of quantum computing before building actual quantum computers.

When Alan Turing figured out computability theory, he was not doing pure math for math's sake; he was trying to grok the nature of computation so that we could actually build better computers. And he was not doing typical science, either. He obviously had considerable experience with computers, but I seriously doubt that, for example, work on his 1936 paper involved running into issues which were resolved by doing experiments. I would say agent foundations researchers have similarly considerable experience with agents.

We need a lot of help

I'm pretty sure that there's a nature-carving concept of non-fooming powerful optimizers, and corrigible agents, and other things, and that if we figure them out, we can navigate the future more safely. And I'm pretty sure it doesn't make sense for me to do experiments to figure it out. Instead I have to learn or invent enough math to have the right concepts, and then prove theorems about them, and that will help enable us to build said safe optimizers.

Cosmologists were able to construct an unimaginably precise and deep theory of the origin of the universe, despite never being able to perform interventional experiments. Nuclear physicists were able to get the first nuclear reactions and detonations (mostly) right on the first try.

Maybe if we can get as many agent foundations researchers as there were nuclear physicists or cosmologists, we can collectively discover as much understanding about the nature of agents to navigate to a good future.

  1. ^

    I say "I" here only because I don't want to put words in the mouths of other agent foundations researchers. My sense is that what I'm saying here is true for the whole field, but other researchers should feel free to chime in.

  2. ^

    Other sub-fields of AI safety can usefully do experiments on existing models, because they're asking different questions (like "how can we interpret existing models?" and "in what ways are existing models dangerous?"). This research is much more like a standard science, and that's great! AI safety needs a million different people doing a million different jobs. I think agent foundations is one of those jobs.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

代理基础 AI安全 研究方法 数学建模 哲学思辨
相关文章