Towards more cooperative AI safety strategies

Published on July 16, 2024 4:36 AM GMT

This post is written in a spirit of constructive criticism. It's phrased fairly abstractly, in part because it's a sensitive topic, but I welcome critiques and comments below. The post is structured in terms of three claims about the strategic dynamics of AI safety efforts; my main intention is to raise awareness of these dynamics, rather than advocate for any particular response to them.

Claim 1: The AI safety community is structurally power-seeking.

By “structurally power-seeking” I mean: tends to take actions which significantly increase its power. This does not imply that people in the AI safety community are selfish or power-hungry; or even that these strategies are misguided. Taking the right actions for the right reasons often involves accumulating some amount of power. However, from the perspective of an external observer, it’s difficult to know how much to trust stated motivations, especially when they often lead to the same outcomes as self-interested power-seeking.

Some prominent examples of structural power-seeking include:

Trying to raise a lot of money.Trying to gain influence within governments, corporations, etc.Trying to control the ways in which AI values are shaped.Favoring people who are concerned about AI risk for jobs and grants.Trying to ensure non-release of information (e.g. research, model weights, etc).Trying to recruit (high school and college) students.

To be clear, you can’t get anything done without being structurally power-seeking to some extent. However, I do think that the AI safety community is more structurally power-seeking than other analogous communities (such as most other advocacy groups). Some reasons for this disparity include:

consequentialist

effectiveness

urgency

responsibility

elites with homogeneous motivations

Again, these are intended as descriptions rather than judgments. Traits like urgency, consequentialism, etc, are often appropriate. But the fact that the AI safety community is structurally power-seeking to an unusual degree makes it important to grapple with another point:

Claim 2: The world has strong defense mechanisms against (structural) power-seeking.

In general, we should think of the wider world as being very cautious about perceived attempts to gain power; and we should expect that such attempts will often encounter backlash. In the context of AI safety, some types of backlash have included:

Strong public criticism of not releasing models publicly.Strong public criticism of centralized funding (e.g. billionaire philanthropy).Various journalism campaigns taking a “conspiratorial” angle on AI safety.Strong criticism from the FATE community about “whose values” AIs will be aligned to.The development of an accelerationist movement focused on open-source AI.

These defense mechanisms often apply regardless of stated motivations. That is, even if there are good arguments for a particular policy, people will often look at the net effect on overall power balance when judging it. This is a useful strategy in a world where arguments are often post-hoc justifications for power-seeking behavior.

To be clear, it’s not necessary to avoid these defense mechanisms at all costs. It’s easy to overrate the effect of negative publicity; and attempts to avoid that publicity are often more costly than the publicity itself. But reputational costs do accumulate over time, and also contribute to a tribalist mindset of “us vs them” (as seen most notably in the open-source debate) which makes truth-seeking harder.

Claim 3: The variance of (structurally) power-seeking strategies will continue to increase.

Those who currently take AGI and ASI seriously have opportunities to make investments (of money, time, social capital, etc) which will lead to much more power in the future if AI continues to become a much, much bigger deal.

But increasing attention to AI will also lead to increasingly high-stakes power struggles over who gets to control it. So far, we’ve seen relatively few such power struggles because people don’t believe that control over AI is an important type of power. That will change. To some extent this has already happened (with AI safety advocates being involved in the foundation of three leading AGI labs) but as power struggles become larger-scale, more people who are extremely good at winning them will become involved. That makes AI safety strategies which require power-seeking more difficult to carry out successfully.

How can we mitigate this issue? Two things come to mind. Firstly, focusing more on legitimacy. Work that focuses on informing the public, or creating mechanisms to ensure that power doesn’t become too concentrated even in the face of AGI, is much less likely to be perceived as power-seeking.

Secondly, prioritizing competence. Ultimately, humanity is mostly in the same boat: we're the incumbents who face displacement by AGI. Right now, many people are making predictable mistakes because they don't yet take AGI very seriously. We should expect this effect to decrease over time, as AGI capabilities and risks become less speculative. This consideration makes it less important that decision-makers are currently concerned about AI risk, and more important that they're broadly competent, and capable of responding sensibly to confusing and stressful situations, which will become increasingly common as the AI revolution speeds up.

Discuss

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签