Lab governance reading list

少点错误 2024年10月26日

Lab governance reading list

文章涵盖了AGI安全与治理的多方面内容，包括实验室应采取的措施、安全实践的集合、模型评估、控制策略等，还涉及AI公司的做法及相关资源等

📋文章提到了一些关于AGI安全和治理的实践，如Towards best practices in AGI safety and governance中收集的许多简要描述的安全实践

🛡️强调了确保强大的AI受到控制的重要性，包括控制技术和应对风险评估结果的方法

💡探讨了实验室在部署强大AI时面临的困境及应采取的高层行动，如Racing through a minefield中所描述的

📑提及了一些相关资源，如Newsletters、AI Lab Watch、Center for AI Safety等，以及它们在AI安全方面的作用

Published on October 25, 2024 6:00 PM GMT

What labs should do

Towards best practices in AGI safety and governance: A survey of expert opinion

Responsible Scaling Policies

Key Components of an RSP

Model evals for dangerous capabilities

Clymer twitter thread

Structured access

Deployment Corrections

Open-Sourcing Highly Capable Foundation Models

~~Yet more:~~

~~Assessing AI Foundation Model Risk Along a Gradient of Access~~

~~(Institute for Security and Technology: Brammer et al. 2023);~~

~~The Gradient of Generative AI Release~~

~~(Solaiman 2023)~~

^[1]

The case for ensuring that powerful AIs are controlled

^[2]

The prototypical catastrophic AI action is getting root access to its datacenter

Racing through a minefield: the AI deployment problem

The Checklist: What Succeeding at AI Safety Will Involve

What AI companies should do: Some rough ideas

OpenAI^[3]

Preparedness Framework (Beta)

Zach Stein-Perlman

Integrity incidents/issues/imperfections

Resources

Garrison Lovely

Center for AI Safety

Other scorecards & evaluation - AI Lab Watch

Commitments - AI Lab Watch

Suggestions are welcome. You can put suggestions that don't deserve their own LW comment in this doc.

^{^}
There are two main lines of defense you could employ to prevent schemers from causing catastrophes.
Source: The case for ensuring that powerful AIs are controlled (Redwood: Greenblatt and Shlegeris 2024).
^{^}
[Maybe a lot of early AI risk—risk from AIs that are just powerful enough to be extremely useful—]comes from the lab using AIs internally to do AI development (by which I mean both research and engineering). This is because the AIs doing AI development naturally require access to compute and model weights that they can potentially leverage into causing catastrophic outcomes—in particular, those resources can be abused to run AIs unmonitored.
Using AIs for AI development looks uniquely risky to me among applications of early-transformative AIs, because unlike all other applications I know about:
Source: Shlegeris 2024.
^{^}
I wrote this post because I'm helping BlueDot create/run a lab governance session. One constraint they impose is focusing on OpenAI, so I made an OpenAI section. Other than that, this doc is just my recommendations.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AGI安全治理措施控制策略 AI资源

相关文章

价值 999！Apache CoC 大会门票免费送；ToT 大模型时间推理基准数据集上新

生成式人工智能的伦理治理与风险防控：破解科林格里奇困境的新路径

AI顶会KDD’24今日截稿！Llama 3.1中文微调数据集已上线，超大模型一键部署

Clarifying alignment vs capabilities

突破万字长文输出瓶颈！清华大学开源 LongWriter-6k 数据集；7 个 CCF A 类顶会即将截稿

旅客将长满疙瘩的脚放小桌板上对此我们竟然毫无办法阻止？

辛巴小杨哥骂战揭开大闸蟹“江湖”，蟹卡乱象几时休？

引起众怒的广州「电鸡」，到了不得不管的时候

能生成任何人裸照的性犯罪，从韩国开始泛滥

加速派又赢了？OpenAI又一保守派老将辞职 AGI准备工作组随之解散