少点错误 2024年10月26日
Lab governance reading list
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章涵盖了AGI安全与治理的多方面内容,包括实验室应采取的措施、安全实践的集合、模型评估、控制策略等,还涉及AI公司的做法及相关资源等

📋文章提到了一些关于AGI安全和治理的实践,如Towards best practices in AGI safety and governance中收集的许多简要描述的安全实践

🛡️强调了确保强大的AI受到控制的重要性,包括控制技术和应对风险评估结果的方法

💡探讨了实验室在部署强大AI时面临的困境及应采取的高层行动,如Racing through a minefield中所描述的

📑提及了一些相关资源,如Newsletters、AI Lab Watch、Center for AI Safety等,以及它们在AI安全方面的作用

Published on October 25, 2024 6:00 PM GMT

What labs should do

OpenAI[3]

Resources

Suggestions are welcome. You can put suggestions that don't deserve their own LW comment in this doc.

  1. ^

     There are two main lines of defense you could employ to prevent schemers from causing catastrophes.

      Alignment: Ensure that your models aren't scheming.Control: Ensure that even if your models are scheming, you'll be safe, because they are not capable of subverting your safety measures.

    Source: The case for ensuring that powerful AIs are controlled (Redwood: Greenblatt and Shlegeris 2024).

  2. ^

    [Maybe a lot of early AI risk—risk from AIs that are just powerful enough to be extremely useful—]comes from the lab using AIs internally to do AI development (by which I mean both research and engineering). This is because the AIs doing AI development naturally require access to compute and model weights that they can potentially leverage into causing catastrophic outcomes—in particular, those resources can be abused to run AIs unmonitored.

    Using AIs for AI development looks uniquely risky to me among applications of early-transformative AIs, because unlike all other applications I know about:

      It’s very expensive to refrain from using AIs for this application.There’s no simple way to remove affordances from the AI such that it’s very hard for the AI to take a small sequence of actions which plausibly lead quickly to loss of control. In contrast, most other applications of AI probably can be controlled just by restricting their affordances.

    Source: Shlegeris 2024.

  3. ^

     I wrote this post because I'm helping BlueDot create/run a lab governance session. One constraint they impose is focusing on OpenAI, so I made an OpenAI section. Other than that, this doc is just my recommendations.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AGI安全 治理措施 控制策略 AI资源
相关文章