少点错误 02月20日
Eliezer's Lost Alignment Articles / The Arbital Sequence
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了一系列关于AI对齐的重要文章,这些文章最初发表在Arbital平台,由Eliezer Yudkowsky、Nate Soares、Paul Christiano等人撰写。由于平台未普及,这些高质量内容未被广泛阅读。现在,它们被导入到LessWrong。文章内容主要集中在AI对齐和数学方面,如贝叶斯指南和对数指南等数学教育材料,以及工具性趋同、可纠正性、认知/工具效率和关键行为等AI对齐思想的详细解释。这些文章最初以wiki页面的形式发布,没有固定的阅读顺序,LessWrong团队从中挑选并排序了20篇最有价值的文章,旨在为读者提供良好的阅读体验。

🧠**AI安全思维模式**: 成功构建一个“友好的”高级通用人工智能(AGI)需要什么样的思维模式?文章探讨了构建安全AGI所需的关键认知框架。

⚙️**收敛工具性策略与工具性压力**: 诸如“收集所有资源”和“不要让自己被关闭”等子目标,对各种目标和价值观都非常有用,这些策略在不同目标之间具有普遍适用性。

⚠️**语境灾难**: 在一个语境(例如,训练,智能较低时)中成立的对齐属性,是否能推广到另一个语境(部署,智能更高时)?文章讨论了AI对齐在不同语境下的泛化问题。

🎯**正交性论题**: 正交性论题断言,可以存在任意智能的代理,追求任何种类的目标。智能水平与目标内容之间没有必然联系。

🔒**可纠正性的难题**: 构建一个智能体,使其在直觉上从开发者的外部视角进行推理,意识到自身的不完整性,并需要外部纠正,这是一个难题。这并非智能体的默认行为。

Published on February 20, 2025 12:48 AM GMT

Note: this is a static copy of this wiki page. We are also publishing it as a post to ensure visibility.

Circa 2015-2017, a lot of high quality content was written on Arbital by Eliezer Yudkowsky, Nate Soares, Paul Christiano, and others. Perhaps because the platform didn't take off, most of this content has not been as widely read as warranted by its quality. Fortunately, they have now been imported into LessWrong.

Most of the content written was either about AI alignment or math[1]. The Bayes Guide and Logarithm Guide are likely some of the best mathematical educational material online. Amongst the AI Alignment content are detailed and evocative explanations of alignment ideas: some well known, such as instrumental convergence and corrigibility, some lesser known like epistemic/instrumental efficiency, and some misunderstood like pivotal act.

The Sequence

The articles collected here were originally published as wiki pages with no set reading order. The LessWrong team first selected about twenty pages which seemed most engaging and valuable to us, and then ordered them[2][3] based on a mix of our own taste and feedback from some test readers that we paid to review our choices.

Tier 1

These pages are a good reading experience.

1. AI safety mindsetWhat kind of mindset is required to successfully build an extremely advanced and powerful AGI that is "nice"?
2.Convergent instrumental strategies and Instrumental pressureCertain sub-goals like "gather all the resources" and "don't let yourself be turned off" are useful for a very broad range of goals and values.
3.Context disasterCurrent terminology would call this "misgeneralization". Do alignment properties that hold in one context (e.g. training, while less smart) generalize to another context (deployment, much smarter)?
4.Orthogonality ThesisThe Orthogonality Thesis asserts that there can exist arbitrarily intelligent agents pursuing any kind of goal.
5.Hard problem of corrigibilityIt's a hard problem to build an agent which, in an intuitive sense, reasons internally as if from the developer's external perspective – that it is incomplete, that it requires external correction, etc. This is not default behavior for an agent.
6.Coherent Extrapolated VolitionIf you're extremely confident in your ability to align an extremely advanced AGI on complicated targets, this is what you should have your AGI pursue.
7.Epistemic and instrumental efficiency"Smarter than you" is vague. "Never ever makes a mistake that you could predict" is more specific.
8.Corporations vs. superintelligencesIs a corporation a superintelligence? (An example of epistemic/instrumental efficiency in practice.)
9.Rescuing the utility function"Love" and "fun" aren't ontologically basic components of reality. When we figure out what they're made of, we should probably go on valuing them anyways.
10.Nearest unblocked strategyIf you tell a smart consequentialist mind "no murder" but it is actually trying, it will just find the next best thing that you didn't think to disallow.
11.MindcrimeThe creation of artificial minds opens up the possibility of artificial moral patients who can suffer.
12.General intelligenceWhy is AGI a big deal? Well, because general intelligence is a big deal.
13.Advanced agent propertiesThe properties of agents for which (1) we need alignment, (2) are relevant in the big picture.
14.Mild optimization"Mild optimization" is where, if you ask your advanced AGI to paint one car pink, it just paints one car pink and then stops, rather than tiling the galaxies with pink-painted cars, because it's not optimizing that hard. It's okay with just painting one car pink; it isn't driven to max out the twentieth decimal place of its car-painting score.
15.CorrigibilityThe property such that if you tell your AGI that you installed the wrong values in it, it lets you do something about that. An unnatural property to build into an agent.
16.Pivotal ActAn act which would make a large positive difference to things a billion years in the future, e.g. an upset of the gameboard that's decisive "win".
17.

Bayes Rule Guide

 

An interactive guide to Bayes' theorem, i.e, the law of probability governing the strength of evidence - the rule saying how much to revise our probabilities (change our minds) when we learn a new fact or observe new evidence.
18.Bayesian View of Scientific VirtuesA number of scientific virtues are explained intuitively by Bayes' rule.
19.A quick econ FAQ for AI/ML folks concerned about technological unemploymentAn FAQ aimed at a very rapid introduction to key standard economic concepts for professionals in AI/ML who have become concerned with the potential economic impacts of their work.

 

Tier 2

These pages are high effort and high quality, but are less accessible and/or of less general interest than the Tier 1 pages. 

The list starts with a few math pages before returning to AI alignment topics.

20.Uncountability Sizes of infinity fall into two broad classes: countable infinities, and uncountable infinities.
21.Axiom of ChoiceThe axiom of choice states that given an infinite collection of non-empty sets, there is a function that picks out one element from each set.
22.Category theoryCategory theory studies the abstraction of mathematical objects (such as sets, groups, and topological spaces) in terms of the morphisms between them.
23.Solomonoff Induction: Intro DialogueA dialogue between Ashley, a computer scientist who's never heard of Solomonoff's theory of inductive inference, and Blaine, who thinks it is the best thing since sliced bread.
24.Advanced agent propertiesAn "advanced agent" is a machine intelligence smart enough that we start considering how to point it in a nice direction.
25. Vingean uncertaintyVinge's Principle says that you (usually) can't predict exactly what an entity smarter than you will do, because if you knew exactly what a smart agent would do, you would be at least that smart yourself. "Vingean uncertainty" is the epistemic state we enter into when we consider an agent too smart for us to predict its exact actions.
26.Sufficiently optimized agents appear coherent Agents which have been subject to sufficiently strong optimization pressures will tend to appear, from a human perspective, as if they obey some bounded form of the Bayesian coherence axioms for probabilistic beliefs and decision theory.
27.Utility indifferenceA proposed solution to the hard problem of corrigibility.
28.Problem of fully updated deferenceOne possible scheme in AI alignment is to give the AI a state of moral uncertainty implying that we know more than the AI does about its own utility function, as the AI's meta-utility function defines its ideal target. Then we could tell the AI, "You should let us shut you down because we know something about your ideal target that you don't, and we estimate that we can optimize your ideal target better without you."
29.Ontology identification problemIt seems likely that for advanced agents, the agent's representation of the world will change in unforeseen ways as it becomes smarter. The ontology identification problem is to create a preference framework for the agent that optimizes the same external facts, even as the agent modifies its representation of the world.
30.Edge instantiationThe edge instantiation problem is a hypothesized patch-resistant problem for safe value loading in advanced agent scenarios where, for most utility functions we might try to formalize or teach, the maximum of the agent's utility function will end up lying at an edge of the solution space that is a 'weird extreme' from our perspective.
31.Goodhart's CurseGoodhart's Curse is a neologism for the combination of the Optimizer's Curse and Goodhart's Law, particularly as applied to the value alignment problem for Artificial Intelligences.
32.Low impactA low-impact agent is one that's intended to avoid large bad impacts at least in part by trying to avoid all large impacts as such.
33.Executable philosophy'Executable philosophy' is Eliezer Yudkowsky's term for discourse about subjects usually considered in the realm of philosophy, meant to be used for designing an Artificial Intelligence. 
34.Separation from hyperexistential riskAn AGI design should be widely separated in the design space from any design that would constitute a hyperexistential risk". A hyperexistential risk is a "fate worse than death".
35.Methodology of unbounded analysisIn modern AI and especially in value alignment theory, there's a sharp divide between "problems we know how to solve using unlimited computing power", and "problems we can't state how to solve using computers larger than the universe". 
36.Methodology of foreseeable difficultiesMuch of the current literature about value alignment centers on purported reasons to expect that certain problems will require solution, or be difficult, or be more difficult than some people seem to expect. The subject of this page's approval rating is this practice, considered as a policy or methodology.
37.Instrumental goals are almost-equally as tractable as terminal goalsOne counterargument to the Orthogonality Thesis asserts that agents with terminal preferences for goals like e.g. resource acquisition will always be much better at those goals than agents which merely try to acquire resources on the way to doing something else, like making paperclips. This page is a reply to that argument.
38.Arbital: Solving online explanationsA page explaining somewhat how the rest of the pages here came to be.

Lastly, we're sure this sequence isn't perfect, so any feedback (which you liked/disliked/etc) is appreciated – feel free to leave comments on this page.

 

  1. ^

    Mathematicians were an initial target market for Arbital.

  2. ^

    The ordering here is "Top Hits" subject to a "if you start reading at the top, you won't be missing any major prerequisites as your read along".

  3. ^

    The pages linked here are only some of the AI alignment articles, and the selection/ordering has not been endorsed by Eliezer or MIRI. The rest of the imported Arbital content can be found via links from the pages below and also from the LessWrong Concepts page (use this link to highlight imported Arbital pages).



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI对齐 通用人工智能 AI安全 LessWrong
相关文章