少点错误 03月06日
Musings on Scenario Forecasting and AI
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了AI未来发展的情景预测方法,强调了预测的复杂性和潜在风险。文章指出,简单地设想好或坏的场景容易被驳斥,更重要的是考虑所有可能情景中积极与消极结果的比例。作者提出了几种AI发展可能达到的稳定状态,包括全球合作管控、超级智能工具AI以及多个人工智能共存等,并强调了对齐和安全措施的重要性。同时,文章也讨论了预测中的世界建模,将影响AI未来的因素分为研发变量和环境因素,并分析了它们之间的相互作用,为更准确地预测AI未来提供了思路。

🌍**情景预测的局限与核心问题**:简单地设想AI的好坏场景容易被驳斥,更重要的是评估所有可能情景中,积极与消极结果发生的概率,从而更全面地评估风险。

🛡️**实现AI稳定状态的多种路径**:文章提出了几种可能的稳定状态,包括:主要大国达成协议限制危险AI发展;开发无自主能力的超级智能工具AI;以及通过国际合作开发并控制先进AI等,旨在降低潜在风险。

⚠️**AI对齐的重要性与潜在威胁**:即使AI对齐变得容易,仍然存在滥用、冲突和专制控制的风险。确保所有超级智能都对齐,并负责任地使用它们至关重要,否则,一个失控的超级智能可能导致灾难性后果。

🔬**世界建模在AI预测中的作用**:文章强调,准确的预测需要建立世界模型,将影响AI未来的因素分为研发变量(如AI能力、控制和对齐)和环境因素(如地缘政治和监管),并分析它们之间的相互作用。

Published on March 6, 2025 12:28 PM GMT

I have yet to write detailed scenarios for AI futures, which definitely seems like something I should do considering the title of my blog (Forecasting AI Futures). I have speculated, pondered and wondered much in the recent weeks—I feel it is time. But first, I have some thoughts about scenario forecasting.

The plan:

    Write down general thoughts about scenario forecasting with special focus on AI (this post).Write one or two specific scenarios for events over the coming months and years.Wait a few months, see what comes true, and update my scenario forecasting methods.

Other work

In 2021, Daniel Kokotaljo published a scenario titled What 2026 looks like. He managed to predict many important aspects of the progression of AI between 2021 and now, such as chatbots, chain-of-thought, and inference compute scaling.

Now he is collaborating with other forecasters—including Eli Lifland, a superforecaster from the Samotsvety forecasting group—to develop a highly detailed and well-researched scenario forecast under the AI Futures Project. Designed to be as predictively accurate as possible, it illustrates how the world might change as AI capabilities evolve. The scenario is scheduled to be published in Q1 of this year.

I also recommend reading Scale Was All We Needed, At First and How AI Takeover Might Happen in 2 Years, two brilliant stories exploring scenarios with very short timelines to superintelligence.

Using and Misusing Scenario Forecasting

People may hear about a specific superintelligence disaster scenario, and then confidently say something like “That seems entirely implausible!” or “AIs will just be tools, not agents!” or “If it tried that, humans could just [insert solution].”

There is some fundamental issue here. Those who see significant risks struggle to convey their concerns to those who think everything is fine.

And I think this problem at least partly has to do with scenario forecasting. One side is envisioning specific bad scenarios, which can easily be refuted. The other side is envisioning specific good scenarios, which can also be easily refuted.

The question we should consider is something more like “Out of all possible scenarios, how many lead to positive outcomes versus negative ones?”. But this question is harder to reason about, and any reasoning about it takes longer to convey.

We can start by considering what avoiding the major risks would mean. The world needs to reach a stable state with minimal risks. For example:

    Major powers agree to never develop dangerously sophisticated AI. All significant datacenters are monitored for compliance, and any breach results in severe punishment.Superintelligent tool AI is developed—a system without ability of agentic behavior that lacks goals. Like the above scenario, there are extremely robust control mechanisms and oversight; no one can ask the AI to design WMDs or develop other potentially dangerous AI systems.There is a single aligned superintelligence that follows human instructions—whether through a government, a shadow government, the population of a nation, or even a global democratic system. There are advanced superintelligence-powered security measures ensuring that no human makes dangerous alterations to the AI. There are reliable measures for avoiding authoritarian scenarios where some human(s) take control over the future and directs it in ways the rest of humanity would not agree to.There are several superintelligent AIs, perhaps acting in an economy similar to the current one. More superintelligences may occasionally be developed. Humans are still alive and well, and in control of the AIs. There are mechanisms that ensure that all superintelligences are properly aligned, or can’t take any action that would harm humans, e.g. through highly advanced formal verification of the AIs and their actions.

There are certainly other relatively stable states. Imagine, for instance, a scenario where AIs are granted rights—such as property ownership and voting. Strict regulation and monitoring ensure that no superintelligence can succeed in killing most or all humans with e.g. an advanced bioweapon. This scenario could, however, lead to AIs outcompeting humans. Unless human minds are simulated in large quantities, AIs would far outnumber humans and have basically all voting power in a democratic system.

For those arguing that there are no significant risks, I ask: What specific positive scenario(s) do you anticipate? Will one of them simply happen by default?

A single misaligned superintelligence might be all it takes to end humanity. Some think the first AI to reach superintelligence will have a sharp left turn; capabilities generalize across domains while alignment properties fail to generalize. My impression is that those that think that AI-caused extinction is highly probable considers this the major threat, or at least one of the major threats. By default, alignment methods break apart when capabilities generalize, rendering them basically useless, and we lose control over an intelligence much smarter than us.

But what if alignment turns out to be really easy? Carelessness, misuse, conflicts and authoritarian control risks remain. How do you ensure everyone aligns their superintelligences, and use them responsibly? Some have suggested pivotal acts, such as using the first (hopefully aligned) superintelligence to ensure that no other potentially unsafe superintelligences are ever developed. Others are arguing that the most advanced AIs should be developed in an international collaborative effort and controlled according to international consensus, hopefully leading to a stable scenario like scenario 2 and 3 above. See What success looks like for further discussion on how these scenarios may be reached.

When considering questions like “Will AI kill us all?” or “Will there be a positive transition to a world with radically smarter-than-human artificial artificial intelligence?”, I try to imagine stable scenarios like the ones above and estimate the probability that such a state is achieved before some catastrophic disaster occurs.

Please comment the type of stable scenario you find most likely! Which one should we aim for?

Some Basic World Modeling

Predictively accurate forecasting scenarios should not be written as you write fiction—they should follow rules of probability, as well as cause and effect. They should tell you about things you might actually see, which requires that all details of the scenario are consistent with each other and with the current state of the world.

This requires some world modeling.

I will provide an example. While it might not be the best model, or entirely comprehensive, it should serve to illustrate my way of thinking about forecasting. For a more thorough world modeling attempt, see Modeling Transformative AI Risk (MTAIR).

When forecasting, I categorize facts and events. For instance, benchmark results fall under AI Capabilities, while AI misuse cases fall under AI Incidents. Let’s call these categories variables—things that feel especially important when thinking about AI futures. These variables affect each other—often in complex ways, as detailed below. The variables can in turn be categorized into Research & Development (R&D) Variables and Circumstantial Variables. Under each variable, I have included other variables that it affects and describe their relationship.

Research and Development (R&D) Variables

Circumstantial Variables

We can analyze more complex interactions between the variables. For instance, a misaligned AI has sufficiently advanced capabilities that it can circumvent its control mechanisms, which increases incident risks. An AI lab that are confident in the alignment of their AI, will be more confident in their control, motivating the lab to use the AI for further automation of their research and development.

With these variables and their interactions in place, we can craft plausible scenarios:

The future will surely be a confusing combination of an unnumerable number of scenarios such as these.

Crossroads and Cruxes

Scenarios may involve small details with far-reaching implications—things that could be called ‘crossroads’ or ‘cruxes’.

Consider these example scenarios:

If you consider it highly likely that a certain crossroads / crux will shape the future, you can target it to have greater impact. You could aim to present valuable information or advice to the committee in the first example, work on security at the leading AI labs to avoid exfiltration, or work on interpretability to discover scheming and misalignment hidden in the AI weights or internal processes.

I’m not saying these scenarios in particular are necessarily very likely, they are just for illustration.

Say you want to contribute towards solving a large, complicated problem. You could tackle the central issues or contribute in some way that is helpful regardless of how the central problems are solved. Find and work on a subproblem that occurs in most scenarios. Or alternatively, instead of solving the central parts or sub-problems, consider actions that improve the overall circumstances in most scenarios—e.g. providing valuable resources and information—such as forecasting!

Dealing with Uncertainty

I think it is quite likely that there will be autonomous self-replicating AI proliferating over the cloud at some point before 2030 (70% probability). But what would be the consequences? I could imagine that it barely affects the world; the AIs fail to gain significant amounts of resources due to fierce competition and generally avoid attracting attention, since they don’t want to be tracked and shut down. I could also imagine that there will be thousands or millions of digital entities, circulating freely and causing all types of problems—causing billions or trillions of USD in damage—and it’s basically impossible to shut them all down.

I know too little about the details. How easy is it for the AIs to gain resources? How hard is it to shut them down? How sophisticated could they be? I’ll have to investigate these things further. The 70% probability estimate largely reflects randomness of future events. This would not be true of any probability estimates I might make about the potential effects. They would not reflect randomness in the world—they would mostly reflect my own uncertainty due to ignorance.

Or consider the consequences of AI-powered mass manipulation campaigns—I have no idea how easy it is to manipulate people! People are used to being bombarded with things that are competing for their attention and trying to influence their beliefs and behavior, but AIs open up new spaces of persuasion opportunities. Could humans avoid manipulation by AI friends that are really nice and share all your interests? Again, my uncertainty doesn’t reflect randomness in the world, but lack of understanding on how effective manipulation attempts may be.

Inevitably, when creating scenarios, there will be many things like this—things that you don’t know enough about yet. Perhaps no one does.

So, let’s separate these different forms of uncertainty (basically aleatoric and epistemic uncertainty). Ignorance about certain details should not deter us from constructing insightful scenarios. I may include proliferation in many scenarios but imagine vastly different consequences—and be clear about my uncertainty regarding such effects.

There are a few trends; better benchmark performance, steadily improving chips, and increasingly large training runs are a few examples. Unless there are major interruptions, you can expect the trends to continue, forming a backbone for possible scenarios. But even those trends will break at some point—maybe in a few years, maybe in a few weeks.

In some scenarios I've encountered, it's unclear which parts are well-founded, and which are wild guesses—and I want to know! At times, I have had seemingly large disagreements with people that, upon closer inspection, were just slightly different intuitions about uncertain details that neither party fully understood. We focused our attention on unimportant points to disentangle non-existing disagreements.

I hope that by clearly formulating the reasoning behind these scenarios and identifying which parts are mostly guesses, we can avoid this pitfall and use scenario forecasting as a powerful tool for constructive debate.

Thank you for reading!

 

P.S. For updates on future posts, consider subscribing to Forecasting AI Futures!



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI未来 情景预测 AI风险 AI对齐 世界建模
相关文章