少点错误 07月19日 00:45
A night-watchman ASI as a first step toward a great future
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了一种在超级智能(ASI)出现前夕,由一个高度专业化的ASI系统——“守夜人ASI”——来维护世界和平的构想。作者认为,ASI的出现初期将充满不确定性和潜在危机,无论是AI能力去中心化还是中心化都可能引发地缘政治冲突或大规模恐怖主义。因此,将一个ASI的职责限定在维护全球和平,防止侵略、生物恐怖和竞争性ASI的出现,可以最大程度地保障人类的生存安全,同时避免早期锁定不良价值观或政治体制。该构想借鉴了政治理论中“守夜人国家”的概念,强调ASI的最小化干预,主要负责维持秩序而非管理具体事务,并提出通过国际协议和透明机制来确保其可信度,以应对ASI时代初期的挑战,为人类更长远的未来奠定基础。

🛡️ **ASI时代的潜在风险与“守夜人”的必要性**:文章指出,在第一个超级智能(ASI)出现前后,世界可能面临混乱、不稳定和危险。这种风险源于强大的AI能力可能分散在多个行为者手中,导致小规模行动者发动大规模生物恐怖袭击,或由少数实体控制,引发地缘政治冲突。即使AI对齐问题得以解决,这种不确定性依然存在。因此,作者提出设立一个“守夜人ASI”,其核心职责是维护世界和平,防止重大灾难,同时不牺牲长远未来的价值,为人类应对ASI时代的初期挑战提供安全保障。

⚖️ **“守夜人ASI”的职责与限制**:借鉴政治理论中的“守夜人国家”概念,作者设想的“守夜人ASI”将是一个超人类智能系统,其职能被严格限定在维持世界和平。具体职责包括:防止国家间的侵略(如入侵主权国家)、阻止大规模生物恐怖(如病毒传播)以及防止可能威胁其自身和平维护能力的竞争性ASI的出现。作者强调,守夜人ASI不应干预小规模犯罪或争端,这些仍由各国政府处理。同时,守夜人ASI能够通过和平手段(如警告和有限的武力干预)来阻止威胁,并且会通过自我改进来保持其能力优势,但其存在本身不应成为AI能力发展的永久性上限。

🤝 **“守夜人ASI”的可行性与支持基础**:该提议的关键在于其潜在的广泛支持。作者认为,如果“守夜人ASI”的设计能够让主要行为者(如美国和中国)验证其确实是为了维护和平,并且其职责界限清晰,那么它可能获得各方的认同。这类似于国际条约的达成,虽然可能需要在具体问题上做出妥协(例如在某些地区冲突的干预上),但为了共同的利益(避免ASI竞赛带来的灾难),达成一致是可能的。文章还提到,透明度和可审计性是获得信任的关键,并类比了美国宪法的制定过程,强调在共同利益驱动下,通过协商达成妥协的重要性。

🌌 **防止长期锁定与促进未来发展**:作者特别强调,“守夜人ASI”的设计目标之一是避免“锁定”(lock-in),即避免过早地将人类的未来导向一个不可逆转的、可能不理想的状态。守夜人ASI的任务是保持和平,为人类在更成熟、更安全、更有智慧时做出重大决策创造条件。它不应决定宇宙的分配或确立永久性的治理模式。此外,它还负责阻止其他形式的锁定,如AI强制的专制主义,并防止早期对太空的过度索取,确保未来有更多机会和选择,从而为人类最终走向一个更美好的未来保留可能性。

Published on July 18, 2025 4:40 PM GMT

I took a week off from my day job of aligning AI to visit Forethought and think about the question: if we can align AI, what should we do with it? This post summarizes the state of my thinking at the end of that week. (The proposal described here is my own, and is not in any way endorsed by Forethought.)

Thanks to Mia Taylor, Tom Davidson, Ashwin Acharya, and a whole bunch of other people (mostly at Forethought) for discussion and comments.

And a quick note: after writing this, I was told that Eric Drexler and David Dalrymple were thinking about a very similar idea in 2022, with essentially the same name. My thoughts here are independent of theirs.

 

The world around the time of ASI will be scary

I expect the time right around when the first ASI gets built to be chaotic, unstable, and scary. This is true even if we fully solve the alignment problem, for a few reasons:

Either way, I think the world’s #1 priority during this time should be existential security. In other words:

So, let’s say we find ourselves in such a world: we think we know how to build aligned AIs, but the world is still scary. What do we do?

My proposal is that a leading actor (or coalition of leading actors) build a night-watchman ASI. In one sentence, this means a super-human AI system whose purview is narrowly scoped to maintain world peace. The rest of this post elaborates on this proposal.

I think the specific proposal outlined below makes the most sense if the world looks something like this:

However, I think the proposal (or modifications of it) is workable in somewhat different worlds as well (see more here).

It may be helpful to think of the night-watchman ASI as the centerpiece of a US-China AI treaty that averts an all-out race to ASI. This isn’t the only way that we might get a night-watchman ASI, but it’s one of the more plausible ways.

 

The night-watchman ASI

The night-watchman state is a concept from political theory that was popularized by Robert Nozick. A night-watchman state is a form of government that:

    Protects people from rights violations (e.g. physical violence and theft); andPreserves its monopoly on violence (e.g. by dismantling militias that threaten to limit its ability to do #1).

Essentially, the night-watchman state is the minimal possible government that fulfills the basic duty of “keeping the peace”.

I think that in the world I describe above, it makes sense for an ASI to fulfill these basic duties, but at a geopolitical scale. (For the rest of this post, I’ll be calling this ASI the night watchman.) I’ll go into some more details later, but some central examples of the night watchman’s responsibilities are:

 

Three key properties

I like this idea because I think some version of the idea has three key properties:

I will argue briefly for each of these points later, but first, I’ll elaborate on the night watchman’s responsibilities.

 

The night watchman’s responsibilities

Here’s a brief description of what I’m imagining the night watchman will do.

First and foremost: keeping the peace

Centrally, this means preventing large-scale geopolitical aggression (such as invasions of sovereign states) and catastrophes (such as a genocide or a bioterrorist releasing a virus). I don’t especially think that the night watchman needs to be involved in small crimes and disputes (such as one-off murders) -- that can be dealt with in conventional ways by nation-states.

Early on, it might make sense for the powers that build the night watchman to give the night watchman the forces and resources it needs in order to keep the peace. That said, I think that the night watchman will be able to keep the peace through peaceful means. If it observes Russia preparing to invade Poland, it will tell Russia “Hey, I see that you’re invading Poland. You won’t succeed, because I’m way more powerful than you.” At this point, it would be rational for Russia to back off, but if it doesn’t, the night watchman will destroy their weapons without injuring humans.

It’s possible that real-world compromises will need to be made to this “keeping the peace” ideal, in order to get the major powers on board. For example, maybe ideally there would be no Chinese invasion of Taiwan, but an explicit carve-out would be made in order to get China on board. This would be sad (in my view), but perhaps necessary.

There will also be edge cases, where it’s unclear whether something falls under the night watchman’s purview. More below on how to deal with edge cases.

 

Minimally intrusive surveillance

The night watchman will need to observe the world in enough detail that it can keep the peace. This is easy for large-scale threats like one country invading another. It’s a little trickier for threats like bioterrorism, but (my guess is) ultimately not that hard.

 

Preventing competing ASIs

The biggest threat to the night watchman’s ability to keep the peace is other ASIs. And so it’ll either prevent training runs that might create such ASIs, or audit the ASIs in order to ensure that they will not take actions that the night watchman would want to prevent. (This might involve extensive oversight of the training process.)

This might be a sticking point, because countries may potentially want to build powerful AI systems. In order to facilitate this, one of the night watchman’s responsibilities will be to recursively self-improve -- or to build more powerful (and aligned) versions of itself -- in order to raise the ceiling on AI capabilities that it considers safe.

Where will the night watchman get the resources to self-improve? I’m pretty agnostic about this point, but I think it might be reasonable to allow the night watchman to (ethically) participate in the world economy in a way that lets it gain resources.

 

Preserving its own integrity

The night watchman should prevent attempts to shut it down or modify its aims, except through procedures agreed upon in advance when the night watchman is created.

 

Preventing premature claims to space

Imagine that in 1700, England signed a treaty with the world’s other major powers that gave them all parts of Canada in exchange for all of the Milky Way Galaxy outside of the Earth. I think that such a treaty should be considered illegitimate today, for basically two reasons:

And so, if some country (e.g. Singapore) tries to claim a large part of the lightcone, in exchange for natural resources on Earth or money or whatever, that should also be considered illegitimate. If Singapore tries to send out probes to colonize its claimed portion of the lightcone, the night watchman should stop it from doing so.

I don’t have a fleshed-out story of how exactly parts of space should be “unlocked” to claims over time, but I think that something like this is important to do.

 

Preventing other kinds of lock-in

In general, we should be pretty scared of permanent lock-in happening early in the transition to ASI. One type of lock-in is entrenched, AI-enforced authoritarianism.

 

Preventing underhanded negotiation tactics

Even after the night watchman is installed, the world won’t be fully stable. Countries will be building really impressive new technologies, doing stuff in space, etc. In the process, there will be lots of negotiation between different countries and centers of power. The night watchman should prevent underhanded negotiation tactics. For example, it should prevent extortion: if the United States is making a deal with Muslim countries, it shouldn’t be able to say “Sign this deal, or else we’ll draw a bunch of pictures of Muhammad.”

 

Arguing for the “three key properties” above

Above, I articulated three key properties of the night watchman proposal. Here I will argue for them briefly.

 

Getting everyone on board

First: I think it’s really important that the night watchman be built in a way where the major powers can verify that the ASI being built really will be built to keep the peace. Ideally, this would happen in two steps:

I don’t know if this is too much to expect, but I think it’s not crazy to expect a situation that’s about half as good as that, where the major powers trust the process mostly but not entirely. If there isn’t enough trust for this plan to go through, some alternative proposals might work instead (see here).

But even if the ASI-building process is really transparent, can a model spec for the night watchman really be agreed to by all major powers? I’m optimistic about that, for the basic reason that it keeps the peace in a time of perils. However, I expect there to be sticking points. For example, how would the night watchman address possible Chinese military actions in Taiwan?

My basic take is that it doesn’t seem too difficult to hammer out a compromise that is acceptable to all major powers. We saw a similar situation with the U.S. Constitution, where there were particular sticking points, both on the object level (what would happen with the slave trade?) and the meta level (equal or proportional representation of states?). A compromise on all these issues was reached because a union was strongly in the common interest of the states, and a wide range of compromises was better than no union at all. Ultimately, one was struck.

I’m imagining a similar situation, but this time with significant AI assistance for finding compromises.

 

Protecting humanity in the short run

I think it’ll be pretty easy for the night watchman to keep the peace, because it’ll be by far the most capable AI, and will make sure that the world stays that way (until it is amended or retired, see below).

 

No major lock-in

The night watchman is explicitly tasked with preventing lock-in, but could the creation of the night watchman in itself be a major lock-in event?

My intuition is that this can be avoided, because the night watchman’s role is pretty limited. It doesn’t decide how to allocate the universe or anything like that; to a first approximation, it just keeps the peace. So while lock-in might happen later, the hope is that it’ll happen at a time when humanity is wiser, more secure, and generally more capable of making reasoned decisions.

That said, I do think that certain specifications of the night watchman’s role might result in lock-in.[1] I haven’t thought through the details, but we should take care to avoid such specifications when hammering out details.

 

Interpretive details

Even if countries are mostly on board with the specific vision outlined above, there will no doubt be conflict when it comes to specific details. For specific details that are foreseeable at the time that the night watchman is created, those can be hammered out with explicit compromises (see above).

But in the medium term, I think it makes sense to establish a process to resolve ambiguities about what the night watchman should do. In the United States, this is the job of the courts (this is called statutory interpretation). And we could imagine a similar resolution mechanism, with a group of humans (or AIs, or AI-assisted humans) deciding what should happen. This leaves open the question of how these humans should be appointed, but I think reasonable compromises could be found and struck.

But also, we’re dealing with an ASI, and we should probably take advantage of that fact. We could give it instructions on how to resolve ambiguities in its rules. This might look something like:

 

Amending the night watchman’s goals

There should probably be a process for amending the night watchman’s duties, or even retiring the night watchman entirely. Doing so should probably be difficult: it should require a consensus of the world’s major powers. I’m not sure how best to specify the conditions required for amendment. My hope would be that these conditions wouldn’t “lock in” a current conception of the world and its major powers. For example, the amendment conditions shouldn’t mention the United States and China by name, because the U.S. and China might no longer be important entities in the world 10 or 100 years from the time of the night watchman’s creation.

 

Modifications to the basic proposal

(Thanks to Ashwin Acharya for many of the thoughts in this section.)

The proposal outlined above might or might not make sense in practice, depending on factors like how AI develops (e.g. hard vs. continuous takeoff) and geopolitical circumstances (e.g. who is ahead in the AI race, and by how much). However, I think the core idea of a powerful AI system designed to keep the peace is realistic under a wide range of circumstances, and can be adapted to the particular circumstances we end up encountering.

 

Multiple subsystems

Instead of there being one night-watchman ASI, maybe it will make more sense for there to be multiple AI systems with separate goals: one system protects from biological threats, another prevents the deployment of unsafe AI systems, another negotiates between countries to prevent war, and so on.

 

An American night watchman and a Chinese night watchman overseeing each other

If it’s too hard to build a single system that both the U.S. and China trust, you could imagine the U.S. and China agreeing to build their own systems. Maybe the Chinese night watchman oversees the U.S. and its allies, while the American night watchman oversees the rest of the world. This leaves open the question of how disagreements get resolved (e.g. if the American night watchman wants to prevent China from invading Taiwan, but the Chinese night watchman wants to stop the American night watchman from intervening). This is similar to the question of how ambiguities and conflicts get resolved by the singleton night watchman in my proposal above.

 

Keeping the peace through soft power

Above, I imagined that the night watchman has the intelligence and hard power necessary to prevent a major power like the United States from launching an invasion. Maybe that won’t be realistic, e.g. because countries won’t be willing to give the necessary resources to the night watchman. You could imagine that the night watchman uses soft power (e.g. diplomacy) to prevent war/invasion, rather than literally shooting down missiles.

 

Conventional treaties

In worlds where takeoff is fairly continuous but we don’t fully trust AIs to be aligned, you could imagine a more conventional treaty that allows for the world’s major actors to gradually build more and more powerful AIs, with enough transparency that each side’s training procedures can be verified by the other side.

 

Checks and balances

In the same way that the U.S. federal government is structured to have three branches that oversee each other, you could imagine multiple systems comprising the night watchman, each with different roles. For example, maybe one system decides what actions should be taken to keep the peace; another verifies that those actions are within the limits of the night watchman’s purview; another takes those actions.

 

The night watchman as a transition

What actually happens after the night watchman is installed? One possibility is that countries will choose to form a world government, and the process will look pretty similar to the founding of the United States (with countries being analogous to states). The world government would decide on things like whether and how to build a Dyson sphere, and how to use the resulting energy. The night watchman would not prevent the formation of such a world government, assuming that it’s done non-coercively.

When I started thinking about this project, I was conceptualizing myself as trying to write something akin to a constitution for this world government. Most major actions would be taken by powerful AI systems, and the constitution would describe the process by which it would be decided which actions the AIs will take.[2]

But my current view is that that particular can can be kicked down the road. Will there be a world government? If so, what form will it take? What will its constitution look like? These are all really interesting questions, but ones that will be decided by people with AI advisors that are way smarter than me.

By contrast, I think it’s important to think through now how to set the stage for these sorts of post-ASI discussions to happen. Building a consensus around how we can get through this time of perils peacefully, in a way that’s acceptable to all major geopolitical actors, is a priority for today, because a concrete-ish proposal would ease tensions and set the stage for negotiations. This post was my attempt at sketching such a proposal.

 

  1. ^

    One example: if the night watchman’s model spec refers specifically to the U.S. and China, that might lock in the U.S. and China as playing important roles in the future, even if very few people live in those countries. This is similar to if an important treaty that gave significant power to the Vatican were still in force today. (Thanks to Rose Hadshar for this analogy.)

  2. ^

    This is analogous to how the U.S. Constitution describes the process by which it is decided which actions the U.S. executive branch takes.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

超级智能 AI安全 世界和平 夜守人ASI AI伦理
相关文章