少点错误 5小时前
AI Optimization, not Options or Optimism
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文回应了埃里克·德雷克斯勒关于AI的观点,作者提出了“控制”与“对齐”的核心概念区分。作者认为,对于狭窄领域的AI,我们可以实现有效“控制”;但随着AI通用性增强,我们更应关注如何“对齐”其目标与人类价值观,以避免潜在的失控风险。文章还引入了“结果影响系统”(OIS)的范畴,指出包括人类在内的社会技术系统本身也是OIS,其内在的偏好可能与人类友好性不一致,因此在构建AI的同时,也需审视和优化人类系统。作者强调,AI的成功发展需要科学与工程的协同,以及对发展路径的审慎选择,例如通过国际合作来管理AI军备竞赛,确保在技术成熟前建立有效的安全理论和控制机制。

🎯 **控制与对齐的区分**:作者区分了AI的“控制”和“对齐”。“控制”是指在AI领域足够狭窄时,能够自信地控制其操作结果,将其视为工具。而“对齐”则侧重于从内部着手,研究如何将目标编码进AI,如何设定优化系统以稳健地实现这些目标,以及在AI能力日益增强时,设定何种目标对人类才是审慎的。作者认为,过度依赖“控制”可能带来危险的失控风险,而“对齐”人类友善的目标至关重要。

🌍 **结果影响系统(OIS)的普遍性**:文章引入了“结果影响系统”(OIS)的概念,将其定义为拥有能力并利用这些能力来影响未来,以实现其偏好的系统。作者指出,任何由技术和人类组成的社会技术系统本身也是一个OIS,其偏好可能以分布式的方式编码在技术和人脑中。这意味着,不仅要谨慎构建强大的AI,也要关注和优化发展AI的人类系统,确保它们也与人类友好性保持一致。

🤝 **促进有效行动与合作**:作者认为,在AI发展问题上,需要更多人从“参与者”而非仅仅是“旁观者”的角色出发,采取有效行动。作者以支持PauseAI为例,强调了通过国际合作、政府监管来减缓AI军备竞赛,并发展AI对齐与安全理论的重要性。科学与工程应协同发展,理论的进步能指导工程实践,而工程的工具又能反哺科学研究,共同推动AI安全发展。

💡 **“不现实”的现实与审慎进步**:面对AI发展中可能出现的“多重前所未有的成功”才能实现目标的困境,作者指出这可能导致成功场景被忽视。然而,作者认为可以通过“还原论”的方法,在现有范式内建立理论并进行检验,将AI视为特例来研究,从而使“前所未有的成功”变得不那么不现实。这需要我们提前研究ASI,并找到能够实现它的现实路径,而非仅追求即时成功。

⏸️ **审慎暂停以确保安全**:作者并非反对AI发展,反而认为其发展至关重要。但正是因为其重要性,作者主张“审慎暂停”(PauseAI)AI的进一步发展,直到我们能够有信心、有把握地推进,并建立起与之潜力相匹配的安全理论和控制机制。这是一种对负责任发展的呼吁,旨在避免在技术尚未准备好时就贸然前进,从而可能导致灾难性的后果。

Published on August 5, 2025 1:07 AM GMT

This post is a response to Eric Drexler's recent article, "AI Options, not ‘Optimism‘", and his ideas as I understand them in general.

Control and Alignment

I have been shifted by Drexler's POV, and I have a great deal of respect his writing and ideas, but I think the biggest remaining schism I see in our thinking surrounds what I call "control" vs "alignment".

They are very nuanced and interrelated, but roughly, "control" is what is done with AI that is narrow enough in domain that it is possible to confidently control it's operational outcomes. Sufficiently narrow AI appears more as a tool. The more general an AI becomes, the more likely it is that it must be thought of as an agent which we are commanding or otherwise manipulating the behaviour of.

On the other hand, "alignment" focuses on the inside to out, rather than outside to in, direction, starting with the study of how goals can be encoded, how optimization systems can be set to robustly target those goals, and what goals it is prudent for humanity to set as our systems become more and more capable.

It seems to me that Drexler's views focus on control, which is maybe what he means by "steerable" AI, and the idea that control of sufficiently capable systems will enable the control of, and development of systems for control of, even more capable systems. In my own view, the instability in cascading control systems of extremely great capability is very dangerous. We may be able to map out and understand this danger in order to progress safely to with higher capability systems than otherwise, but at some point the system's encoded goals must be aligned to human friendliness, or eventual catastrophic loss of control will take place. It is completely possible I have misunderstood his views, in which case, I apologize, but in any case I would be happy to hear more thoughts on this, either from Drexler, or any other readers.

Outcome Influencing Systems

Related, I think there is an important paradigm that is missing from the ASI discussion. The basic idea is to focus on Outcome Influencing Systems (OISs), which are defined as having capabilities they use for influencing the future towards outcomes which suit their preferences. I think some of Drexler's ideas incorporate the implications quite well, but the implication I find most important is that any socio-technical system, composed of technology and humans acting with conventions for communication and action, is itself an OIS with it's own preferences which may be encoded in a distributed way in both the physical technological world, and peoples physical brains. Despite having people as components, these OISs may not be aligned to human friendliness similarly to how ASI might not be aligned to human friendliness. This means it is not sufficient to be careful in the analysis and construction of powerful AI, we must also be careful in the analysis and construction of the human systems that are developing AI.

I have been working on draft of a document to explore and explain the OIS paradigm. I'm far from satisfied with it, but I am beginning to look for people who are willing to engage with and critique the ideas. If you are, or know anyone who might be, please pass this document onto them: https://docs.google.com/document/d/1zzz1omn62KbCO0KVX0oy87aZt6uw6RswLGLbMoAnP0I/edit?usp=sharing

Spectators and Participants

I think Drexler's discussion on spectators and participants points out something important, but is also a false dichotomy. We do need more people focused on effective actions, not merely voicing disapproval in ways that do not affect our situation.

For example, I am a supporter of PauseAI, an organization that seeks to promote and help develop an international treaty for mutually verifiable de-escalation of the ASI / AGI arms race, including government regulation on not pushing capabilities further until we have had time to develop AI Alignment / AI Safety / AI Control theory that allows us to progress with confidence commensurate to the potential risk and reward of future powerful AI technology.

Good engineers are not in opposition to scientists who uselessly speculate while the engineers do all the work. Rather, science and engineering should be symbiotic. The development of scientific theories with more predictive power allows engineers to do greater things easier, and in turn, scientific instruments may be engineered to gather greater amounts of high precision data to further develop scientific theory.

The problem I see is not that there are too many people focusing on prediction and too few on action, but rather there are too many people focusing on actions that too many people have predicted run too high a risk of disaster, and not enough people focusing on actions that can stop those who push us towards irresponsible development. Both the science and engineering of AI must be improved before we are ready to progress to ASI.

When realism seems unrealistic

Drexler points out an interesting phenomenon. If we are aiming for a target which is hard to hit, then hitting it is necessarily unlikely, and this may have psychological implications. To quote:

    Successful outcomes require multiple unprecedented successes.Scenarios with multiple unprecedented successes seem unrealistic.Therefore, scenarios that make success possible get little attention.

I think this is indeed an important dynamic to pay attention to, especially as it relates to ASI, which may only be built once, after which it's effects (probably) supersede our ability to alter it or try again. This means ASI is a technology that must do something unprecedented: It must work correctly on the first try.

This, if presented along with no other details, indeed makes it seem unrealistic, but it is possible to apply reductionism to ASI.

I see no reason we could not create theories within paradigms and to examine and test those theories and show that they should extend to cover ASI as a special case. The paradigm of OIS, as mentioned above is my attempt to develop one such paradigm, but the point is that, as Drexler himself is a contributor to, we can make study of ASI before building it, and if we do so sufficiently well, unprecedented successes need not seem unrealistic. William Tell shooting the apple atop his son's head may have been unprecedented, but his skill at archery was not.

The reason it seems scenarios that make success possible get little attention may be that those scenarios look like PauseAI's goal: Negotiation of an international treaty to slow down and do things right. And this is not the scenario that someone may want to identify as making success possible if that person wants to pretend we could have success now, instead of when we can realistically achieve it.

I do not wish to stop ASI, in fact I think it's development is probably of vital importance, but for that very reason, I do think we must pause ASI.

Thanks for reading.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI控制 AI对齐 结果影响系统 AI安全 埃里克·德雷克斯勒
相关文章