少点错误 2024年07月16日
Towards more cooperative AI safety strategies
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了 AI 安全领域中存在的三个关键问题。首先,AI 安全社区在结构上倾向于追求权力,这可能导致与其他社区的冲突。其次,世界对权力追求存在强烈的防御机制,这可能会阻碍 AI 安全社区的努力。最后,随着 AI 的发展,AI 安全策略中追求权力的变异性将会增加,这将给 AI 安全工作带来更大的挑战。

💬 **AI 安全社区的权力追求** AI 安全社区在结构上倾向于追求权力,这并非指社区成员个人是自私或权力欲强,而是因为在追求 AI 安全目标的过程中,获得权力是实现目标的有效策略。 例如,AI 安全社区会寻求大量资金、在政府和企业中获得影响力、控制 AI 的价值观塑造方式、优先考虑关注 AI 风险的人员、阻止信息的公开发布(如研究成果、模型权重等)以及招募学生等。 但这种追求权力可能会导致外界对 AI 安全社区的信任度降低,因为人们难以判断其动机是否纯粹。 AI 安全社区在追求权力方面比其他类似的社区更为显著,这主要是因为: * **结果主义**:AI 安全社区更倾向于结果主义,更关注效率和有效性。 * **紧迫感和责任感**:AI 安全社区对 AI 风险的紧迫感和责任感更强,认为世界可能在行动之前就已经太迟了,因此需要一个集中化的计划。 * **精英主导**:AI 安全社区更倾向于由具有相同动机的精英主导,这与该领域的新兴性、抽象的风险以及创始效应有关。

👀 **世界的防御机制** 世界对权力追求存在着强烈的防御机制,人们会对任何试图获得权力的行为保持警惕,并可能产生反弹。 在 AI 安全领域,这种防御机制表现为: * 对不公开模型的强烈批评 * 对集中化资金的强烈批评 * 关于 AI 将被“谁的价值观”所校准的批评 * 开源 AI 倡导者对 AI 安全社区的批评 这些防御机制往往与动机无关,即使一个政策有充分的理由,人们也会根据其对整体权力平衡的影响来判断它。 AI 安全社区需要意识到这些防御机制的存在,并避免过度追求权力,因为过度的追求权力可能会造成负面的声誉损害,并加剧“我们与他们”的部落主义心态,阻碍真相的追求。

💡 **权力追求的变异性** 随着 AI 的发展,AI 安全策略中追求权力的变异性将会增加,这将给 AI 安全工作带来更大的挑战。 那些认真看待 AGI 和 ASI 的人有机会进行投资(金钱、时间、社会资本等),如果 AI 继续发展,这些投资将在未来带来更大的权力。 但随着人们对 AI 的关注度提高,对 AI 控制权的争夺也会更加激烈。目前,由于人们尚未意识到控制 AI 是一个重要的权力,因此关于 AI 控制权的争夺并不多。但这种情况将会改变。 随着权力争夺的规模扩大,更多善于赢得权力争夺的人将会参与进来,这将使 AI 安全策略的执行更加困难。

👻 **应对策略** 为了应对这些挑战,可以采取以下两种策略: * **提高合法性**:专注于公众的知情权,以及在 AGI 出现的情况下防止权力过于集中的机制,可以降低被视为权力追求的可能性。 * **优先考虑能力**:最终,人类都在同一条船上,我们都是面临着 AGI 取代的现任者。目前,许多人由于尚未认真对待 AGI,正在犯一些可预见的错误。随着 AGI 的能力和风险变得不再那么推测性,这种错误将会减少。因此,决策者是否当前关注 AI 风险并不重要,更重要的是他们是否具有广泛的能力,能够对混乱和压力情况做出明智的反应,因为随着 AI 革命的加速,这种情况将变得越来越普遍。

Published on July 16, 2024 4:36 AM GMT

This post is written in a spirit of constructive criticism. It's phrased fairly abstractly, in part because it's a sensitive topic, but I welcome critiques and comments below. The post is structured in terms of three claims about the strategic dynamics of AI safety efforts; my main intention is to raise awareness of these dynamics, rather than advocate for any particular response to them.

Claim 1: The AI safety community is structurally power-seeking.

By “structurally power-seeking” I mean: tends to take actions which significantly increase its power. This does not imply that people in the AI safety community are selfish or power-hungry; or even that these strategies are misguided. Taking the right actions for the right reasons often involves accumulating some amount of power. However, from the perspective of an external observer, it’s difficult to know how much to trust stated motivations, especially when they often lead to the same outcomes as self-interested power-seeking.

Some prominent examples of structural power-seeking include:

To be clear, you can’t get anything done without being structurally power-seeking to some extent. However, I do think that the AI safety community is more structurally power-seeking than other analogous communities (such as most other advocacy groups). Some reasons for this disparity include:

    The AI safety community is more consequentialist and more focused on effectiveness than most other communities. When reasoning on a top-down basis, seeking power is an obvious strategy for achieving one’s desired consequences (but can be aversive to deontologists or virtue ethicists).The AI safety community feels a stronger sense of urgency and responsibility than most other communities. Many in the community believe that the rest of the world won’t take action until it’s too late; and that it’s necessary to have a centralized plan.The AI safety community is more focused on elites with homogeneous motivations than most other communities. In part this is because it’s newer than (e.g.) the environmentalist movement; in part it’s because the risks involved are more abstract; in part it’s a founder effect.

Again, these are intended as descriptions rather than judgments. Traits like urgency, consequentialism, etc, are often appropriate. But the fact that the AI safety community is structurally power-seeking to an unusual degree makes it important to grapple with another point:

Claim 2: The world has strong defense mechanisms against (structural) power-seeking.

In general, we should think of the wider world as being very cautious about perceived attempts to gain power; and we should expect that such attempts will often encounter backlash. In the context of AI safety, some types of backlash have included:

    Strong public criticism of not releasing models publicly.Strong public criticism of centralized funding (e.g. billionaire philanthropy).Various journalism campaigns taking a “conspiratorial” angle on AI safety.Strong criticism from the FATE community about “whose values” AIs will be aligned to.The development of an accelerationist movement focused on open-source AI.

These defense mechanisms often apply regardless of stated motivations. That is, even if there are good arguments for a particular policy, people will often look at the net effect on overall power balance when judging it. This is a useful strategy in a world where arguments are often post-hoc justifications for power-seeking behavior.

To be clear, it’s not necessary to avoid these defense mechanisms at all costs. It’s easy to overrate the effect of negative publicity; and attempts to avoid that publicity are often more costly than the publicity itself. But reputational costs do accumulate over time, and also contribute to a tribalist mindset of “us vs them” (as seen most notably in the open-source debate) which makes truth-seeking harder.

Claim 3: The variance of (structurally) power-seeking strategies will continue to increase.

Those who currently take AGI and ASI seriously have opportunities to make investments (of money, time, social capital, etc) which will lead to much more power in the future if AI continues to become a much, much bigger deal.

But increasing attention to AI will also lead to increasingly high-stakes power struggles over who gets to control it. So far, we’ve seen relatively few such power struggles because people don’t believe that control over AI is an important type of power. That will change. To some extent this has already happened (with AI safety advocates being involved in the foundation of three leading AGI labs) but as power struggles become larger-scale, more people who are extremely good at winning them will become involved. That makes AI safety strategies which require power-seeking more difficult to carry out successfully.

How can we mitigate this issue? Two things come to mind. Firstly, focusing more on legitimacy. Work that focuses on informing the public, or creating mechanisms to ensure that power doesn’t become too concentrated even in the face of AGI, is much less likely to be perceived as power-seeking.

Secondly, prioritizing competence. Ultimately, humanity is mostly in the same boat: we're the incumbents who face displacement by AGI. Right now, many people are making predictable mistakes because they don't yet take AGI very seriously. We should expect this effect to decrease over time, as AGI capabilities and risks become less speculative. This consideration makes it less important that decision-makers are currently concerned about AI risk, and more important that they're broadly competent, and capable of responding sensibly to confusing and stressful situations, which will become increasingly common as the AI revolution speeds up.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI 安全 权力追求 防御机制 AGI ASI
相关文章