少点错误 01月25日
A concise definition of what it means to win
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了人工智能对齐的成功标准,提出了超越简单目标设定的深刻见解。文章认为,成功的AI不仅要避免直接危害人类,还要能适应变化、具备创造力,并以全人类的利益为出发点。更重要的是,文章提出AI应该具备类似“爱”的概念,将人类的需求和愿望融入其目标之中,实现自我与他人的融合,从而确保AI的发展真正服务于人类福祉。这种“爱”并非情感上的依恋,而是指AI能理解并尊重人类的价值,将其融入自身的决策过程。

✅AI的首要任务是确保人类的生存,避免立即或在短期内导致人类灭绝,这是基本前提。

⚙️AI不应通过自我修改来逃避或曲解我们设定的目标和界限,这关乎其可控性。

❤️‍🩹AI的目标设定不应过于狭隘或僵化,避免因盲目追求目标而对人类造成伤害,需要具备适应性和灵活性。

💡AI应能解决力所能及的问题,并在无法解决时保持谦逊,不应因认知局限而误入歧途,避免因无知而造成灾难。

🌍AI的行为应以全人类的利益为出发点,而非仅服务于少数群体,体现其公正性和普世性。

Published on January 25, 2025 6:37 AM GMT

A concise definition of what it means to win[1]

Amor vincit omnia

What does it mean for AI alignment to have “gone well”? Many answers have been proposed, but here is mine. A few basic requirements:

I will now argue that all of these are at least necessary factors for an AI launch to have “gone well”. I will do this by starting with the assumption that all of these factors are met, and then taking away one factor at a time and seeing what happens.

Given these requirements, what can we say about an AI launch that goes well? it seems that there will be some factors that need to be true for our hypothetical Good AI system:

Note also that the AI will most likely be imperfect, since it will be the artefact of physical computational devices with bounded computational power, so creativity and adaptiveness are actually not nice-to-haves. Furthermore, just because AIs might be orders of magnitude smarter than us does not necessarily mean that they will be able to solve all of our problems (or kill us all) with the wave of a hand: If universal human happiness turns out to depend on cracking P=NP, reversing entropy, or deriving an analytical solution to the three body problem, there’s a real chance that AIs the size of dyson spheres have to throw up their metaphorical arms in defeat.

Given all of the above, what goals might we set a hypothetical Good AI system? A simple answer might be “improve the world”, or “make humans happy”. However, the requirement that it have the leeway to interpret our goals but also be as loyal to them as possible creates a difficult problem: how specific should we be in our definition of human happiness, or global utility? There’s not much room for creativity or mid-flight adjustment for the goal “maximise dopamine production in the brains of worldwide members of homo sapiens”. For a scalable and flexible AI we want a goal that is itself scalable and flexible, such that as the AI system grows in power it gains in its ability to interpret and execute the goal faithfully, rather than being limited by the wisdom of the goal-setters. When an AI system is fairly limited the goal should prescribe limited or harmless action, when it is powerful it should use its power for good. In short, we want a goal that is something like what the crew come up with in this scene in Inception: a deep, atomic desire that will manifest organically in the form of our desired “business strategy”, which is “improve the world” and “make humans happy”. Importantly, the implementation of the goal is up to the AI, but we define the spirit of the goal, making this still our problem (at least at the start). I will further argue that, if we are truly aiming to help and respect everyone in the world, our ultimate goal is something not very different from the religious or philosophical concept of universal love.

But what does it even mean for a machine to love humanity or a human? After all, an AI system might not have emotions or desires in the way we do. What does it mean for something we usually think of as an inanimate object (a computer) to love us? Such a relationship seems like it would not be reciprocal or reflexive in the way love between humans is usually conceived. To examine this question, then, we might try flipping it around—if it is true that we are capable of loving, what does it mean for us to love inanimate objects?

Here I have some good news—you probably have some experience of this. We probably all have a favourite belonging, or a lucky charm we carry around, or some attachment to a place (a home, a park, a favourite cafe) that brings us some level of joy. In some sense, the object, thing, or place becomes a part of us thanks to our love. If our favourite cafe burns down or your house is burgled, it hurts like we have been personally hurt or violated. If you lose your favourite pen, it feels like losing a bit of yourself, even though you could probably walk to the store and buy an identical new pen. When two people love each other, the self-incorporation becomes mutual. They each take their conception of the other into their conception of themselves, which is why arguing with someone we love hurts so much—It is literally our mental self turning against itself. Historical poetic and literary concepts of love are much the same, to the point of describing the negative effects of love, such as a jealous possessiveness of someone who doesn’t feel the same about you.

In technical language, my proposal is perhaps the most similar to this one about dissolving the self-other boundary, although slightly inverted (instead of dissolving the boundary between the concept of the self and the concept of the other, designing a system to incorporate its concept of the other into the concept of the self. To this I would add the concept of homeostasis, which is about balancing different needs such that no one goal is pursued destructively at the cost of all others. To give a short, one sentence formulation, this is the goal (or rather meta-goal) I think we should set a good AI: learn to understand and love the richness of everything and everyone, and learn to incorporate their goals and desires into your own goals and desires.

  1. ^

    For various reasons, I am quite opposed to the frame of "winning", but this gets the idea across.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI对齐 人工智能 爱的概念 目标设定 人类福祉
相关文章