少点错误 01月19日
Is theory good or bad for AI safety?
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了人工智能领域理论研究的复杂性和实用性之间的权衡。作者认为,理论对于理解人工智能的安全性至关重要,但许多理论项目要么是无意义的学术研究,要么是过于追求优雅而脱离实际。同时,过度谨慎地拒绝新理论也会阻碍进步。文章强调,在评估理论时,需要考虑实用性、多元化和优雅等多种因素,并建立共同的理解模型,以促进更有效的讨论和决策。AI安全社区具备构建良好模型的能力,但需要避免过度简化,并认识到抽象和优雅本身的重要性。

💡理论研究是理解人工智能安全性的必要条件,但需警惕无意义的学术研究和过度追求优雅的倾向,避免陷入“优雅陷阱”,即过度扩展理论而失去实用价值。

🤔在评估理论的价值时,需要综合考虑实用性、多元化和优雅等多个维度,避免过于片面的理解,并认识到科学研究中归因的复杂性。

🧩人工智能领域缺乏关于“好理论”和“坏理论”的共识,需要建立共同的模型和语言,以促进更有效的讨论和决策,弥合理论与实践之间的差距。

🔬AI安全社区在构建模型方面具有优势,但应避免过度简化核心概念,并认识到抽象和优雅本身所具有的价值,它们不应被简单地还原为数学模型。

Published on January 19, 2025 10:32 AM GMT

We choose to go to the moon in this decade and do the other things, not because they are easy, but because they are hard. (Kennedy’s famous “We chose to go to the moon” speech)

 

The ‘real’ mathematics of ‘real’ mathematicians, …, is almost wholly ‘useless’ (Hardy’s “A Mathematician’s Apology”)

 

If the "irrational" agent is outcompeting you on a systematic and predictable basis, then it is time to reconsider what you think is "rational". (Yudkowsky’s “Rationality is Systematized Winning”)

 

Shut up and calculate (Merman, apparently)

I have been writing a long post about the modeling theory in different sciences, specifically with a focus on elegance/pragmatism tradeoffs and how to reason about them in the context of safety. It has ballooned up (as these things tend to do), and I'm probably going to write it in a few installments as a sequence.  

But before going in, it's worth explaining why I think building better models and a better language here is crucial.

First, let's answer the question. Is theory good or bad?

If I were to summarise my position on this in one paragraph, it would be “it’s complicated”, with a Fiddler on the Roof-style sequence of ‘on the other hands’ to follow.

And so on.

When I talk to my team at PIBBSS and my friends in AI safety, we have interesting, nuanced debates. My teammates have written about related things here, here and here. But when I look around, what dominates the discourse seem to be very low-context discussions of “THEORY GOOD” or “THEORY BAD”. Millions of dollars in funding are distributed on the premise of barely nuanced versions of one or the other of these slogans, and I don’t like it.

On the one hand, this isn’t an easily fixable situation where someone can just come in and explain what the right takes are. Questions about theory in AI are hard to reason about for a number of reasons.

But on the other hand, the really awful state of the debate and the low "sanity waterline" in institutional thinking about theory and fundamental science is surprising to me. There are extremely low-hanging fruit that are not being picked. There are useful things to say and useful models to build. And when I look around, I don’t see nearly as much effort as I’d like going into doing this.

What we lack here is not so much a "textbook of all of science that everyone needs to read and understand deeply before even being allowed to participate in the debate". Rather, we lack good, commonly held models of how to reason about what is theory, and good terms to (try to) coordinate around and use in debates and decisions.

The AI safety community, having much cultural and linguistic overlap with the lesswrong community (e.g. I am writing this here), has a lot of the machinery for building good models. I really liked the essays by Yudkowsky on science and scientists, like this one. I also really like the linked initiatives by Elizabeth Van Nostrand and Simon deDeo's group on trying to think more rigorously about path-dependence and attribution in the history of science (and getting my favorite kind of answer: it's complicated, but we can still kinda build better models). 

I think there should be more work of this type. But at the same time, as I mentioned before, I think this community has as bit of an issue with reductionism. This biases the community to reduce the core concepts in building theory to something mathy and precise -- "abstraction is description length" or "elegance is consilience". While these constitute valuable formal models and intuition pumps, they do not capture the fact that abstraction and elegance is its own kind of thing, like the notion of positional thinking in chess -- they're not equivalent to formal models thereof. Now I'm not about to say that there is some zen enlightenment that you will only attain once you have purified yourself at the altar of graduate school. These notions can be modeled well, I think, without having the lived experience, in the same way that a chess player can explain how she balances positional and tactical thinking to someone who does not have much experience in the game. A good baseline of concepts to coordinate around here is possible, it just hasn't (to the best of my knowledge) been built or internalized. 

I want to point at Lauren's post here in particular as a physics perspective on the notion of  "something being physical" in valuable and non-reducible inherent notion that is useful and can contribute to better conceptualization here. 

In the next couple of posts in this sequence I am hoping to build up a little more of such a language. I'm aware that I'll probably be reinventing the wheel a lot, and what I'll be giving is a limited take. The hope is that this will start a conversation where more people, perhaps with better ways of operationalizing this, will start coordinating on filling this gap with a bit of a consensus vocabulary. 



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人工智能 理论研究 实用性 AI安全 模型构建
相关文章