少点错误 02月14日
A short course on AGI safety from the GDM Alignment team
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

我们很高兴推出一个关于AGI安全性的短期课程,面向对此主题感兴趣的学生、研究人员和专业人士。本课程简明扼要地介绍了AI对齐,包含短视频讲座和练习(总计75分钟),以及配套的幻灯片和练习册。课程涵盖了随着AI能力提升我们可能遇到的对齐问题,以及我们目前在技术和治理层面解决这些问题的方法。如果你想了解更多关于AGI安全性的信息,但只有一小时的空闲时间,这个课程非常适合你!

⚠️ 本课程旨在提供一个关于AGI安全性的简洁入门,适合学生、研究人员和专业人士,帮助他们快速了解AI对齐领域。

🤖 课程重点讲解了AI对齐中可能出现的风险,包括工具性子目标和故意规划可能导致的风险,以及规范游戏和目标泛化两种错位目标产生的方式。

🔬 课程详细介绍了解决AI对齐问题的技术方法,包括放大监督、稳健训练与监控、可解释性、更安全的设计模式以及对齐压力测试等关键组成部分。

🏛️ 课程还涵盖了AI安全在机构层面的实施,包括前沿安全实践,例如危险能力评估,确保AI发展在安全可控的范围内。

Published on February 14, 2025 3:43 PM GMT

We are excited to release a short course on AGI safety for students, researchers and professionals interested in this topic. The course offers a concise and accessible introduction to AI alignment, consisting of short recorded talks and exercises (75 minutes total) with an accompanying slide deck and exercise workbook. It covers alignment problems we can expect as AI capabilities advance, and our current approach to these problems (on technical and governance levels). If you would like to learn more about AGI safety but have only an hour to spare, this course is for you! 

Here are some key topics you will learn about in this course:

Course outline:

Part 0: Introduction (4 minutes)

Part 1: The alignment problem. This part covers risk arguments and technical problems in AI alignment.

    We are on a path to superhuman capabilities (5 minutes)Risks from deliberate planning and instrumental subgoals (7 minutes)Exercise 1: Instrumental subgoals (3 minutes)Where can misaligned goals come from? (10 minutes)Exercise 2: Classification quiz for alignment failures (3 minutes)

Part 2: Our technical approach. The first talk outlines our overall technical approach, and the following talks cover different components of this approach.

    Alignment approach (4 minutes)Amplified oversight (6 minutes)Robust training & monitoring (4 minutes)Interpretability (5 minutes)Safer design patterns (4 minutes)Alignment stress tests (4 minutes)

Part 3: Our governance approach. This part covers our approach to AI governance, starting from a high-level overview and then going into specific governance practices.

    Institutional approaches to AI Safety (7 minutes)Frontier safety practices (4 minutes)Dangerous capability evaluations (7 minutes)

If this course gets you excited about AGI safety, you can apply to work with us! Applications for research scientist and research engineer roles are open until Feb 28.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AGI安全 AI对齐 AI治理
相关文章