Published on July 18, 2025 4:40 PM GMT
I took a week off from my day job of aligning AI to visit Forethought and think about the question: if we can align AI, what should we do with it? This post summarizes the state of my thinking at the end of that week. (The proposal described here is my own, and is not in any way endorsed by Forethought.)
Thanks to Mia Taylor, Tom Davidson, Ashwin Acharya, and a whole bunch of other people (mostly at Forethought) for discussion and comments.
And a quick note: after writing this, I was told that Eric Drexler and David Dalrymple were thinking about a very similar idea in 2022, with essentially the same name. My thoughts here are independent of theirs.
The world around the time of ASI will be scary
I expect the time right around when the first ASI gets built to be chaotic, unstable, and scary. This is true even if we fully solve the alignment problem, for a few reasons:
- Maybe access to powerful AIs will be pretty decentralized. In that case, small actors could commit major acts of bioterrorism and generally wreak havoc.Or maybe access to powerful AIs will be centralized, with one or a small number of entities controlling by far the most powerful AIs. This seems like a recipe for geopolitical conflict: actors that are behind might take drastic measures (like initiating a nuclear war) if the alternative is an enemy ending up with a decisive strategic advantage.Or maybe we’ll have some mix of these two worlds, with both kinds of threats.
Either way, I think the world’s #1 priority during this time should be existential security. In other words:
- Preventing major catastrophes (especially extinction-level catastrophes), while alsoNot forfeiting much of the value of the long-term future (e.g. by permanently locking in bad values or a bad system of government).
So, let’s say we find ourselves in such a world: we think we know how to build aligned AIs, but the world is still scary. What do we do?
My proposal is that a leading actor (or coalition of leading actors) build a night-watchman ASI. In one sentence, this means a super-human AI system whose purview is narrowly scoped to maintain world peace. The rest of this post elaborates on this proposal.
I think the specific proposal outlined below makes the most sense if the world looks something like this:
- Multiple actors (e.g. the U.S. and China) are racing to ASI. Conflict is escalating, and really bad outcomes like war seem possible (even if not that likely).It looks like takeoff will be fairly sudden: not necessarily as fast as in AI 2027, but people are expecting that we’ll go from “humans are mostly making the important decisions without too much AI assistance” to ASI within a year or two.Luckily, we’ve basically figured out how to train AIs in a way where we trust them to be aligned.
However, I think the proposal (or modifications of it) is workable in somewhat different worlds as well (see more here).
It may be helpful to think of the night-watchman ASI as the centerpiece of a US-China AI treaty that averts an all-out race to ASI. This isn’t the only way that we might get a night-watchman ASI, but it’s one of the more plausible ways.
The night-watchman ASI
The night-watchman state is a concept from political theory that was popularized by Robert Nozick. A night-watchman state is a form of government that:
- Protects people from rights violations (e.g. physical violence and theft); andPreserves its monopoly on violence (e.g. by dismantling militias that threaten to limit its ability to do #1).
Essentially, the night-watchman state is the minimal possible government that fulfills the basic duty of “keeping the peace”.
I think that in the world I describe above, it makes sense for an ASI to fulfill these basic duties, but at a geopolitical scale. (For the rest of this post, I’ll be calling this ASI the night watchman.) I’ll go into some more details later, but some central examples of the night watchman’s responsibilities are:
- Preventing countries from invading other countries.Preventing large-scale bioterror.Preventing actions that would take away its ability to protect the world (e.g. preventing the construction of comparably powerful ASIs that aren’t aligned to it).
- This doesn’t place a permanent ceiling on the capabilities of other AIs, because the night watchman can and should self-improve.
Three key properties
I like this idea because I think some version of the idea has three key properties:
- It can get broad support from key actors (such as the U.S. and China): for some version of the proposal, no powerful actor will want to take actions to prevent the night watchman from existing.The night watchman will protect humanity in the short run, keeping the peace and setting the stage for something like a long reflection.The night watchman doesn’t lock ~anything in (and, as discussed later, will hopefully prevent lock-ins), so I don’t expect this step to reduce humanity’s ability to end up in a great future.
I will argue briefly for each of these points later, but first, I’ll elaborate on the night watchman’s responsibilities.
The night watchman’s responsibilities
Here’s a brief description of what I’m imagining the night watchman will do.
First and foremost: keeping the peace
Centrally, this means preventing large-scale geopolitical aggression (such as invasions of sovereign states) and catastrophes (such as a genocide or a bioterrorist releasing a virus). I don’t especially think that the night watchman needs to be involved in small crimes and disputes (such as one-off murders) -- that can be dealt with in conventional ways by nation-states.
Early on, it might make sense for the powers that build the night watchman to give the night watchman the forces and resources it needs in order to keep the peace. That said, I think that the night watchman will be able to keep the peace through peaceful means. If it observes Russia preparing to invade Poland, it will tell Russia “Hey, I see that you’re invading Poland. You won’t succeed, because I’m way more powerful than you.” At this point, it would be rational for Russia to back off, but if it doesn’t, the night watchman will destroy their weapons without injuring humans.
It’s possible that real-world compromises will need to be made to this “keeping the peace” ideal, in order to get the major powers on board. For example, maybe ideally there would be no Chinese invasion of Taiwan, but an explicit carve-out would be made in order to get China on board. This would be sad (in my view), but perhaps necessary.
There will also be edge cases, where it’s unclear whether something falls under the night watchman’s purview. More below on how to deal with edge cases.
Minimally intrusive surveillance
The night watchman will need to observe the world in enough detail that it can keep the peace. This is easy for large-scale threats like one country invading another. It’s a little trickier for threats like bioterrorism, but (my guess is) ultimately not that hard.
Preventing competing ASIs
The biggest threat to the night watchman’s ability to keep the peace is other ASIs. And so it’ll either prevent training runs that might create such ASIs, or audit the ASIs in order to ensure that they will not take actions that the night watchman would want to prevent. (This might involve extensive oversight of the training process.)
This might be a sticking point, because countries may potentially want to build powerful AI systems. In order to facilitate this, one of the night watchman’s responsibilities will be to recursively self-improve -- or to build more powerful (and aligned) versions of itself -- in order to raise the ceiling on AI capabilities that it considers safe.
Where will the night watchman get the resources to self-improve? I’m pretty agnostic about this point, but I think it might be reasonable to allow the night watchman to (ethically) participate in the world economy in a way that lets it gain resources.
Preserving its own integrity
The night watchman should prevent attempts to shut it down or modify its aims, except through procedures agreed upon in advance when the night watchman is created.
Preventing premature claims to space
Imagine that in 1700, England signed a treaty with the world’s other major powers that gave them all parts of Canada in exchange for all of the Milky Way Galaxy outside of the Earth. I think that such a treaty should be considered illegitimate today, for basically two reasons:
- The major powers in 1700 don’t speak for current and future people.In an important sense, there wasn’t informed consent: the major powers in 1700 didn’t realize how much they were giving up.
And so, if some country (e.g. Singapore) tries to claim a large part of the lightcone, in exchange for natural resources on Earth or money or whatever, that should also be considered illegitimate. If Singapore tries to send out probes to colonize its claimed portion of the lightcone, the night watchman should stop it from doing so.
I don’t have a fleshed-out story of how exactly parts of space should be “unlocked” to claims over time, but I think that something like this is important to do.
Preventing other kinds of lock-in
In general, we should be pretty scared of permanent lock-in happening early in the transition to ASI. One type of lock-in is entrenched, AI-enforced authoritarianism.
Preventing underhanded negotiation tactics
Even after the night watchman is installed, the world won’t be fully stable. Countries will be building really impressive new technologies, doing stuff in space, etc. In the process, there will be lots of negotiation between different countries and centers of power. The night watchman should prevent underhanded negotiation tactics. For example, it should prevent extortion: if the United States is making a deal with Muslim countries, it shouldn’t be able to say “Sign this deal, or else we’ll draw a bunch of pictures of Muhammad.”
Arguing for the “three key properties” above
Above, I articulated three key properties of the night watchman proposal. Here I will argue for them briefly.
Getting everyone on board
First: I think it’s really important that the night watchman be built in a way where the major powers can verify that the ASI being built really will be built to keep the peace. Ideally, this would happen in two steps:
- First, the major powers sign onto a compromise that details what the night watchman’s duties are. This is analogous to a model spec.Second, there will be really strong transparency that will give the major powers assurance that the AI hasn’t been backdoored and that the model spec that it was trained to follow was the agreed-upon one.
I don’t know if this is too much to expect, but I think it’s not crazy to expect a situation that’s about half as good as that, where the major powers trust the process mostly but not entirely. If there isn’t enough trust for this plan to go through, some alternative proposals might work instead (see here).
But even if the ASI-building process is really transparent, can a model spec for the night watchman really be agreed to by all major powers? I’m optimistic about that, for the basic reason that it keeps the peace in a time of perils. However, I expect there to be sticking points. For example, how would the night watchman address possible Chinese military actions in Taiwan?
My basic take is that it doesn’t seem too difficult to hammer out a compromise that is acceptable to all major powers. We saw a similar situation with the U.S. Constitution, where there were particular sticking points, both on the object level (what would happen with the slave trade?) and the meta level (equal or proportional representation of states?). A compromise on all these issues was reached because a union was strongly in the common interest of the states, and a wide range of compromises was better than no union at all. Ultimately, one was struck.
I’m imagining a similar situation, but this time with significant AI assistance for finding compromises.
Protecting humanity in the short run
I think it’ll be pretty easy for the night watchman to keep the peace, because it’ll be by far the most capable AI, and will make sure that the world stays that way (until it is amended or retired, see below).
No major lock-in
The night watchman is explicitly tasked with preventing lock-in, but could the creation of the night watchman in itself be a major lock-in event?
My intuition is that this can be avoided, because the night watchman’s role is pretty limited. It doesn’t decide how to allocate the universe or anything like that; to a first approximation, it just keeps the peace. So while lock-in might happen later, the hope is that it’ll happen at a time when humanity is wiser, more secure, and generally more capable of making reasoned decisions.
That said, I do think that certain specifications of the night watchman’s role might result in lock-in.[1] I haven’t thought through the details, but we should take care to avoid such specifications when hammering out details.
Interpretive details
Even if countries are mostly on board with the specific vision outlined above, there will no doubt be conflict when it comes to specific details. For specific details that are foreseeable at the time that the night watchman is created, those can be hammered out with explicit compromises (see above).
But in the medium term, I think it makes sense to establish a process to resolve ambiguities about what the night watchman should do. In the United States, this is the job of the courts (this is called statutory interpretation). And we could imagine a similar resolution mechanism, with a group of humans (or AIs, or AI-assisted humans) deciding what should happen. This leaves open the question of how these humans should be appointed, but I think reasonable compromises could be found and struck.
But also, we’re dealing with an ASI, and we should probably take advantage of that fact. We could give it instructions on how to resolve ambiguities in its rules. This might look something like:
- Simulate such-and-such panel of people and see what agreement they would come to; orDo what a fair bargain between the world powers would be, in proportion to how much power they have (but probably better specified than that).
Amending the night watchman’s goals
There should probably be a process for amending the night watchman’s duties, or even retiring the night watchman entirely. Doing so should probably be difficult: it should require a consensus of the world’s major powers. I’m not sure how best to specify the conditions required for amendment. My hope would be that these conditions wouldn’t “lock in” a current conception of the world and its major powers. For example, the amendment conditions shouldn’t mention the United States and China by name, because the U.S. and China might no longer be important entities in the world 10 or 100 years from the time of the night watchman’s creation.
Modifications to the basic proposal
(Thanks to Ashwin Acharya for many of the thoughts in this section.)
The proposal outlined above might or might not make sense in practice, depending on factors like how AI develops (e.g. hard vs. continuous takeoff) and geopolitical circumstances (e.g. who is ahead in the AI race, and by how much). However, I think the core idea of a powerful AI system designed to keep the peace is realistic under a wide range of circumstances, and can be adapted to the particular circumstances we end up encountering.
Multiple subsystems
Instead of there being one night-watchman ASI, maybe it will make more sense for there to be multiple AI systems with separate goals: one system protects from biological threats, another prevents the deployment of unsafe AI systems, another negotiates between countries to prevent war, and so on.
An American night watchman and a Chinese night watchman overseeing each other
If it’s too hard to build a single system that both the U.S. and China trust, you could imagine the U.S. and China agreeing to build their own systems. Maybe the Chinese night watchman oversees the U.S. and its allies, while the American night watchman oversees the rest of the world. This leaves open the question of how disagreements get resolved (e.g. if the American night watchman wants to prevent China from invading Taiwan, but the Chinese night watchman wants to stop the American night watchman from intervening). This is similar to the question of how ambiguities and conflicts get resolved by the singleton night watchman in my proposal above.
Keeping the peace through soft power
Above, I imagined that the night watchman has the intelligence and hard power necessary to prevent a major power like the United States from launching an invasion. Maybe that won’t be realistic, e.g. because countries won’t be willing to give the necessary resources to the night watchman. You could imagine that the night watchman uses soft power (e.g. diplomacy) to prevent war/invasion, rather than literally shooting down missiles.
Conventional treaties
In worlds where takeoff is fairly continuous but we don’t fully trust AIs to be aligned, you could imagine a more conventional treaty that allows for the world’s major actors to gradually build more and more powerful AIs, with enough transparency that each side’s training procedures can be verified by the other side.
Checks and balances
In the same way that the U.S. federal government is structured to have three branches that oversee each other, you could imagine multiple systems comprising the night watchman, each with different roles. For example, maybe one system decides what actions should be taken to keep the peace; another verifies that those actions are within the limits of the night watchman’s purview; another takes those actions.
The night watchman as a transition
What actually happens after the night watchman is installed? One possibility is that countries will choose to form a world government, and the process will look pretty similar to the founding of the United States (with countries being analogous to states). The world government would decide on things like whether and how to build a Dyson sphere, and how to use the resulting energy. The night watchman would not prevent the formation of such a world government, assuming that it’s done non-coercively.
When I started thinking about this project, I was conceptualizing myself as trying to write something akin to a constitution for this world government. Most major actions would be taken by powerful AI systems, and the constitution would describe the process by which it would be decided which actions the AIs will take.[2]
But my current view is that that particular can can be kicked down the road. Will there be a world government? If so, what form will it take? What will its constitution look like? These are all really interesting questions, but ones that will be decided by people with AI advisors that are way smarter than me.
By contrast, I think it’s important to think through now how to set the stage for these sorts of post-ASI discussions to happen. Building a consensus around how we can get through this time of perils peacefully, in a way that’s acceptable to all major geopolitical actors, is a priority for today, because a concrete-ish proposal would ease tensions and set the stage for negotiations. This post was my attempt at sketching such a proposal.
- ^
One example: if the night watchman’s model spec refers specifically to the U.S. and China, that might lock in the U.S. and China as playing important roles in the future, even if very few people live in those countries. This is similar to if an important treaty that gave significant power to the Vatican were still in force today. (Thanks to Rose Hadshar for this analogy.)
- ^
This is analogous to how the U.S. Constitution describes the process by which it is decided which actions the U.S. executive branch takes.
Discuss