少点错误 03月23日
Reframing AI Safety as a Neverending Institutional Challenge
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了人工智能(AI)安全问题,认为不应将其简化为纯粹的技术对齐问题,而应视为一个持续的、不引人注目的制度性挑战。文章指出,当前AI安全领域过度关注技术解决方案,如AI对齐,而忽视了更广泛的风险,如权力转移、系统性影响和人为恶意行为。文章强调,随着AI技术的快速发展和扩散,建立有效的制度制衡、提高透明度和问责制、以及加强灾难应对能力至关重要。文章认为,AI安全并非一个可以一劳永逸解决的问题,而是一场需要长期投入和持续努力的斗争。

🤔 传统的AI安全方法,过度依赖技术对齐,即确保AI系统按照人类的意图行动,但这种方法忽视了系统性风险和人为恶意行为的潜在危害。

🌍 文章指出,AI的未来发展更可能呈现为技术扩散而非单一的“奇点”时刻,因此,应对AI安全挑战需要更长远的视角,而非仅仅关注某个关键时间点。

💡 文章强调了建立强大制度的重要性,包括增强政府能力、提高透明度和问责制、以及改进灾难应对措施,以应对AI带来的长期挑战。

💰 文章质疑了技术解决方案的受益者,指出大型AI公司可能通过强调技术对齐来巩固其权力,而忽视了更广泛的社会风险。

Published on March 23, 2025 12:13 AM GMT

Crossposed  from https://stephencasper.com/reframing-ai-safety-as-a-neverending-institutional-challenge/ 

Stephen Casper

“They are wrong who think that politics is like an ocean voyage or a military campaign, something to be done with some particular end in view, something which leaves off as soon as that end is reached. It is not a public chore, to be got over with. It is a way of life.”
– Plutarch 

“Eternal vigilance is the price of liberty.”
– Wendell Phillips

“The unleashed power of the atom has changed everything except our modes of thinking, and we thus drift toward unparalleled catastrophe.” 
– Albert Einstein

“Technology is neither good nor bad; nor is it neutral.”
– Melvin Kranzberg 

“Don’t ask if artificial intelligence is good or fair, ask how it shifts power.”
– Pratyusha Kalluri

“Deliberation should be the goal of AI Safety, not just the procedure by which it is ensured.”
– Roel Dobbe, Thomas Gilbert, and Yonatan Minz

 

As AI becomes increasingly transformative, we need to rethink how we approach safety – not as a technical alignment problem, but as an ongoing, unsexy struggle. 

“What are your timelines?” This question constantly reverberates around the AI safety community. It reflects the idea that there may be some critical point in time in which humanity will either succeed or fail in securing a safe future. It stems from an idea woven into the culture of the AI safety community: that the coming of artificial general intelligence will likely either be apocalyptic or messianic and that, at a certain point, our fate will no longer be in our hands (e.g., Chapter 6 of Bostrom, 2014). But what exactly are we planning for? Why should one specific moment matter more than others? History teaches us that transformative challenges tend to unfold not as pivotal moments, but as a series of developments that test our adaptability. 

An uncertain future: AI has the potential to transform the world. If we build AI that is as good or better than humans at most tasks, at a minimum, it will be highly disruptive; at a maximum, it could seriously threaten the survival of humanity (Hendrycks, et al., 2023). And that’s not to mention the laundry list of non-catastrophic risks that advanced AI poses (Slattery et al., 2024). The core goal of the AI safety community is to make AI’s future go better, but it is very difficult to effectively predict and plan for what is next. Forecasting the future of AI has been fraught, to say the least. Past agendas for AI safety have centered around challenges that bear limited resemblance to the key ones we face today (e.g., Soares and Fallenstein., 2015; Amodei et al., 2016; Everitt et al., 2018; Critch and Krueger, 2020). Meanwhile, many of the community’s investments in scaling labs, alignment algorithms, (e.g., Christiano et al., 2017), and safety-coded capabilities research have panned out to be doubtfully net-positive for safety at best (e.g., Ahmed et al., 2023; Ren et al., 2024).

The Pandora’s Box: If AI becomes highly transformative and humanity survives it, two outcomes are possible: either some coherent coalition will monopolize the technology globally, or it will proliferate. In other words, some small coalition will take control of the world—or one will not. While the prospect of global domination is concerning for obvious reasons, there are also strong reasons to believe it is unlikely. Power is deeply embedded in complex global structures that are resistant to being overthrown. Moreover, possessing the capacity for world domination does not guarantee that it will happen. For example, in the 1940s, nuclear weapons could have enabled a form of global control, yet no single nation managed to dominate the world. Much of the alarm within the AI safety community today resembles what a movement centered around nuclear utopianism or dystopianism might have looked like in the 1930s. Historically, transformative technologies have consistently spread once developed, and thus far, AI has constantly shown signs that it will continue to follow this pattern.

The risks we face: Imagine a future where AI causes significant harm. What might lead to such an outcome? One scenario involves complex systemic effects (Arnscheidt et al., 2024; Kulveit et al., 2025). Another non-mutually-exclusive possibility is for harm to arise from the choices of some identifiable set of actors. In cases involving such perpetrators, the event would fall on a spectrum ranging from a pure accident to a clearly intentional act. We can taxonomize catastrophic AI risks accordingly:

Will technical solutions save us? Inside the AI safety community, it is common for agendas to revolve around solving the technical alignment challenge: getting AI systems’ actions to serve the goals and intentions of their operators (e.g., Amodei et al., 2016; Everitt et al., 2018; Critch and Krueger, 2020, Ji et al., 2023). This focus stems from a long history of worrying that rogue AI systems pursuing unintended goals could spell catastrophe (e.g., Bostrom, 2014). However, the predominance of these concerns seems puzzling in light of our risk profile (Khlaaf, 2023). Alignment can only be sufficient for safety if (1) no catastrophes result from systemic effects and (2) the people in control of the AI are reliably benevolent and responsible. Both are highly uncertain. And if/when transformative AI opens a Pandora’s box, it will become almost inevitable that some frontier developers will be malicious or reckless. On top of this, being better at technical alignment could also exacerbate risks by speeding up timelines and enabling more acutely harmful misuse. 

If AI causes a catastrophe, what are the chances that it will be triggered by the choices of people who were exercising what would be considered to be “best safety practices” at the time? The history of human conflict and engineering disasters (Dekker, 2019) suggests that these chances are very small. For example, consider nuclear technology. The technical ‘nuclear alignment’ problem is virtually solved: nuclear tech reliably makes energy when its users want and reliably destroys cities when its users want. To be clear, it is very nice to live in a world in which catastrophic failures of nuclear power plants are very rare. Nonetheless, for the foreseeable future, we will be in perpetual danger from nuclear warfare—not for technical reasons, but institutional ones. 

When all you have is a hammer… The AI safety community’s focus on technical alignment approaches can be explained, in part, by its cultural makeup. The community is diverse and discordant. Yet, throughout its existence, it has been overwhelmingly filled with highly technical people trained to solve well-posed technical problems. So it makes sense for technical people to focus on technical alignment challenges to improve safety. However, AI safety is not a model property (Narayanan and Kapoor, 2024), and conflating alignment with safety is a pervasive mistake (Khlaaf, 2023). 

Who benefits from technosolutionism? The overemphasis on alignment can also be explained by how useful it is for technocrats. For example, leading AI companies like OpenAI, Anthropic, Google DeepMind, etc., acknowledge the potential for catastrophic AI risks but predominantly frame these challenges in terms of technical alignment. And isn’t it convenient that the alignment objectives that they say will be needed to prevent AI catastrophe would potentially give them trillions of dollars and enough power to rival the very democratic institutions meant to address misuse and systemic risks? Meanwhile, the AI research community is full of people who have much more wealth and privilege than the vast majority of others in the world. Technocrats also tend to benefit from new technology instead of being marginalized by it. In this light, it is unsurprising to see companies that gain power from AI push the notion that it’s the AI and not them that we should fear – that if they are just able to build and align superintelligence (before China, of course) we might be safe. Does this remind you of anything? 

The long road ahead: If we are serious about AI safety, we need to be serious about preparing for the Pandora’s box that will be opened once transformative AI proliferates. At that point, developing solutions for technical alignment would sometimes be useful. However, they would also fail to protect us from most disaster scenarios, exacerbate misuse risks, concentrate power, and accelerate timelines. Unless we believe in an AI messiah, we can expect the fight for AI safety to be a neverending, unsexy struggle. This underscores a need to build institutions that place effective checks and balances on AI. Instead of figuring out what kind of AI systems to develop, the more pressing question is about how to shape the AI ecosystem in a way that better enables the ongoing challenges of identifying, studying, and deliberating about risks. In light of this, the core priorities of the AI safety community should be to build governmental capacity (e.g. Carpenter & Ezell, 2024; Novelli et al., 2024), increase transparency & accountability (e.g., Uuk et al., 2024; Casper et al., 2025), and improve disaster preparedness (e.g., Wasil et al., 2024, Bernardi et al., 2024). These will be the defining challenges that long-term AI safety will depend on.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI安全 技术对齐 制度建设 风险管理
相关文章