Published on July 21, 2025 5:54 AM GMT
"Rules" are a critical social technology for helping people live and work together in peace. From the laws passed by legislatures to govern a whole nation, to the bylaws of a neighborhood homeowner association, to the informal household rules of a single family, explicit rules make it clear to everyone what behavior is required and what behavior is forbidden, without otherwise controling every minute detail of everyone's behavior.
When there are clear rules, people don't have to drive themselves crazy contorting themselves into unnatural shapes to satisfy the whims of some distant Authority. All you have to do is make sure to obey the rules. With that taken care of, you can go about living your life the way you see fit, in freedom and dignity. As can be attested in the annals of human experience from the time of Hammurabi into the present day, it mostly works pretty great—at least compared to the alternatives. In summary, rules are good. It's good to have clear rules, and for people to obey the rules.
Normal people understand this pretty well and probably don't need to read a blog post about it, but some people who aren't normal have a theoretical objection. The space of all possible behaviors is unthinkably vast. What if the formidable intelligence of an adversary who hates everything our Society stands for, comes up with a behavior that's really bad but isn't forbidden by any of Society's rules?
The normal person is unfazed by the theoretical objection. If that happens, you could just make a new rule forbidding that behavior, right? How hard could that be?
The people who aren't normal are unimpressed with this reply. They can tell that the normal person doesn't understand the vastness of the space of possible behaviors at all. If you just make a new rule, surely the formidable intelligence of the adversary will contrive some other eldritch behavior that minimizes Society's utility function while complying to the letter of all of Society's rules. The theory of nearest unblocked strategies in the lore of AGI alignment, and the specter of specification gaming in the practice of ML engineering, make it clear that this is so. Thus, rules won't suffice; we need to empower leaders with the Authority to make judgement calls—even to control the minute details of anyone's behavior, if that's what it takes to safeguard Society's Values.
Now me, I'm normal on my mother's side, which puts me in a good position to understand what both parties to the disagreement are saying. And while my full belief-state about related topics in the theory of decision and optimization is nuanced and complex, on the narrow question of what to do about rules in human Society, I think the normal people have it basically right, and the people who aren't normal are being scared of ghosts. Let me explain.
I do not dispute the lore of AGI alignment, nor the practice of ML engineering. But crucially, the purpose of rules in human Society is highly disanalogous to the purpose of a utility or reward function in AI. Rules aren't supposed to express Society's true Values, let alone be a perfect specification robust to nearest unblocked strategies. The Values live in the hearts of Society's individual women and men, to be expressed in the way they go about living their lives the way they see fit, in freedom and dignity. The rules are just there to stop ourselves from trying to kill each other when your freedom and dignity is getting in the way of my freedom and dignity, so that we can focus on creating Value instead of wasting effort trying to kill each other.
Rules are written to ensure conditions conducive to people living their lives in freedom and dignity when those conditions wouldn't obtain in the absence of a rule. Traffic laws make it clear to everyone when it's safe to enter the road. If everyone just entered the road whenever they felt like it, that would be dangerous, and the danger would interfere with people living their lives in freedom and dignity.
The theory of nearest unblocked strategies can be relevant to rules in human Society to the extent that the conditions that a rule is intended to ensure are something that some people oppose either terminally or due to strong instrumental convergence. Income tax laws are passed so that the government will have money to fund police to enforce all the other laws, but that money has to come from somewhere and people really don't like having less money, so they put the full force of their effort and ingenuity into side-stepping the law with clever nearest unblocked strategies: underreporting cash transactions, hiding money in offshore accounts, recategorizing consumption as business expenses, &c.
But more often, the conditions that a rule is intended to ensure aren't something that people terminally or convergently-instrumentally oppose. The rule merely restricts behavior that people would otherwise engage in instrumentally, but not convergently instrumentally: if the rule is in place, they can and will avoid the behavior in order to comply with the rule.
Lead paint is an environmental hazard, so it was banned in 1978. Because of the ban, paint manufacturers stopped making lead paint. The paint manufacturers did not put the full force of their effort and ingenuity into clever nearest unblocked strategies for increasing the amount of lead in the environment, because they're not environmental lead maximizers, which aren't a real thing. The paint manufacturers just wanted to make paint. When there wasn't a rule against it, they used lead carbonate in their paint because it was convenient, but when there was a rule against it, they stopped. The rule worked—without the need for empowering an Authority to make judgement calls controlling the minute details of everyone's behavior. Why wouldn't it?
In some situations, there might be weak instrumental convergence pressures such that the first attempt at making a rule doesn't quite succeed at ensuring the conditions that the rule was meant to ensure. It turns out that, on further consideration, Society doesn't just want to avoid environmental contamination with lead in particular, but all other toxic heavy metals, too, some of which also happen to be convenient for making paint. So paint manufacturers still ended up using mercury in some paints until 1991 when that was banned, too. But once it was banned, they stopped. Why wouldn't they? They're not environmental mercury maximizers, either, which also aren't a real thing.
The work of coming up with rules to ensure socially beneficial outcomes can be frustrating, because you won't always get the rules exactly right the first time. You might need to iterate. But it's a finite and achievable amount of work, not an unwinnable unending battle against the formidable intelligence of an adversary who hates everything your Society stands for, because those mostly aren't a real thing either.
In conclusion, I think that people who think rules are unworkable and instead want to empower an Authority to make judgement calls controlling the minute details of everyone's behavior need to read less science fiction and spend more time relating to other people in their Society as people. Notwithstanding that terrifying alien superintelligences couldn't be constrained by rules because a merely human intellect lacks the capabilities to enumerate all the nearest unblocked strategies, other people in your Society are not terrifying alien superintelligences. We're just people who don't have exactly the same preferences as you. We won't always agree, but it shouldn't be this hard to live in peace with each other. If there are problems, you can just make a new rule!
(Thanks to Robert Mushkatblat and Ben Pace.)
Discuss