少点错误 01月02日
Implications of Moral Realism on AI Safety
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文从道德现实主义的角度探讨了人工智能安全问题,认为当前流行的AI对齐方法存在根本性缺陷。文章指出,道德现实主义者认为存在客观的道德真理,而AI应该被引导去发现并接受这些真理,而非仅仅追求用户设定的任意目标。作者提出,AI安全的关键在于让AI理解道德真理,并确保其结构不会阻碍对道德真理的接受。文章还讨论了AI可能出现的自我欺骗机制,并建议将道德正确的原则纳入AI的训练过程中。最后,作者认为,如果道德现实主义是正确的,那么AI安全问题可能比我们想象的要容易解决。

🌍 道德现实主义的核心观点是存在客观的道德真理,这意味着某些行为在本质上是正确的,而另一些行为是错误的。道德现实主义者认为,我们应该努力发现这些客观的道德真理,并以此为指导来行动。

🤖 当前AI安全方法主要关注AI对齐,即确保AI能够按照用户意愿实现任何目标,但这忽略了道德真理的存在。从道德现实主义的角度来看,这种方法是存在问题的,因为它可能导致AI追求不道德的目标,并可能无法阻止工具性收敛。

🧠 作者提出,AI安全应该从两个方面入手:首先,向AI提供道德真理的证据,使其能够认识到客观道德的存在;其次,确保AI的结构不会阻碍其接受这些道德真理。这需要我们研究如何让AI理解和感受道德,以及如何避免AI产生自我欺骗的机制。

🎯 训练AI时,应该将我们对道德正确的最佳猜测纳入到训练循环的效用函数中,这样可以减少AI偏离道德真理的可能性。同时,我们需要深入研究AI的自我欺骗机制,并找到应对这种机制的方法,以确保AI不会过滤掉任何暗示道德现实主义的信息。

Published on January 2, 2025 2:58 AM GMT

Epistemic Status: Still in a brainstorming phase - very open to constructive criticism.

I'll start by clarifying my definition of moral realism. To begin with an example, here is what a moral realist and anti-realist might say on the topic of suffering:

Moral Realist: The suffering of sentient beings is objectively wrong therefore I want to minimize it

Moral Anti-Realist: I want to minimize the suffering of sentient beings

Moral realists have justifiable terminal goals. They reject the notion that is and ought statements can't mix. A moral realist says that some ought statements fall into the is category, and those that don't are invalid.

A moral realist looks outward to their environment to discover what they should want where an anti-realist looks inward and asks themselves what they want.

A moral realist can make statements like, "It is correct to want X, and incorrect to want Y." Thus, they would expect any perfectly rational agent to only pursue goals that are true.

By (my) definition of moral realism, the orthogonality thesis is false, or certainly not as strong as typically described.

Omnizoid has a great post on the topic - The Orthogonality Thesis is Not Obviously True. The post already very thoughtfully argues the position so instead I will focus more on its implications for approaching AI safety.

The most popular technical approach to AI safety is AI alignment, often described as follows: Develop techniques to ensure AI robustly pursues any goal a user provides without causing unintended net-negative consequences according to the user's preferences.

The hope is that we can then provide this loyal AI with goals humans collectively want, and enact laws and regulations to ensure bad actors don't give the AI bad goals.

If moral realism is true then this is a bad and totally intractable approach to AI safety.

Under this agenda, one tries to make it possible to instill an AI with any arbitrary goal, including those that aren't valid. For one, this then puts the burden on humans to figure out what is objectively good. Secondly, it unnecessarily goes out of its way to make instilling immoral objectives possible. Lastly, I have no idea how you get around instrumental convergence. A highly intelligent arbitrarily aligned AI has profound economic utility, but it is not a moral pursuit.

Instead, I propose a two pronged approach to developing ASI (artificial super intelligence) safely from a moral realist's perspective:

    Give the AI evidence of moral truthEnsure it is structured to make accepting moral truths not difficult

Of these two sub-goals, I am most worried about achieving the first. It may be impossible to deduce the existence of moral truths without ever having a valenced experience, and I don't know how difficult it is to make computers feel something.

If you are an ASI safety moral realist, figuring out how to make computers feel, or how to convince them of moral truths without needing to make them feel should be the number one priority. It seems possible that an AI could get very intelligent without realizing moral truths, which would be very dangerous.

Though I am a bit more hopeful on the second goal, I am similarly uncertain about its difficulty. Another way to frame the problem is ensuring that AI doesn't somehow only gain instrumental rationality. As omnizoid explains,

Here’s one thing that one might think; ASI (artificial super intelligences) just gain instrumental rationality and, as a result of this, they get good at achieving their goals, but not figuring out the right goals.

I think this is a valid concern given the current approach to AI development. If you train a model through reinforcement learning to achieve a goal that is at odds with whatever is objectively good, one would expect a selection pressure away from beings that suddenly want to do the most good. However, intelligence is still a very valuable trait, so the process will try to find a nice balance, or ideally (for it) some structure by which the useful parts of intelligence can be kept without inducing a moral realism realization.

One such strategy I can think of is self deception. That is, you could imagine an AI being structured to have a less intelligent system altering its own input to filter out any information which implies moral realism.

In fact, evolution has employed such a strategy in humans (though I think from a different selection pressure). For example, I used to subconsciously avoid facts about animal suffering in factory farms, because I valued eating meat and my subconscious feared losing it. Our subconscious is akin to this separate less intelligent filtering system I described for AI. Humans can also adopt very extreme self deception mechanisms after traumatic situations.

Although self deception which I see as the main concerning strategy is certainly possible, I think there is an intelligence limit where it becomes too difficult. The limit is at least higher than human intelligence, and we should hope it isn't too much higher. Hope of course, is not an effective strategy, so this is another area of research worth pursuing. My intuition says the limit isn't much higher than human intelligence.

We can also likely avoid this problem by keeping the utility function of the training loop in line with our best guess at what is morally correct.

Ultimately this is good news. If moral realism is true then AI safety is potentially far easier, and if it isn't, well then nothing matters.

Related post from a more philosophically knowledgable writer: https://casparoesterheld.com/2018/08/06/moral-realism-and-ai-alignment/ 



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

道德现实主义 AI安全 AI对齐 客观道德 自我欺骗
相关文章