Published on May 21, 2025 2:25 PM GMT
Epistemic status: shower thought quickly sketched, but I do have a PhD in this.
As we approach AGI and need to figure out what goals to give it we will need to find tractable ways to resolve moral disagreement. One of the most intractable moral disagreements is between the moral realists and the moral antirealists.
There's an oversimplified view of this disagreement that goes:
- If you're a moral realist, you want to align AGI to the best moral-epistemic deliberative processes you can find to figure out what is rightIf you're a moral antirealist and you're a unilateralist, you want to stage a coup and tile the world with your valuesIf you're a moral antirealist and you're cooperative, you want to align AGI to a democratic process that locks in whatever values people have today forever
This oversimplified picture gets a lot of play, but it turns moral disagreement into a wholly intractable process of philosophical methodology which we haven't been able to progress beyond Plato's Euthyphro in the last 2400 years. The most important lesson from the positivists and the history of science is that the way you make progress on philosophical disagreements is to make them empirically tractable.
The moral realist / antirealist debate really runs together two distinct questions:
- Is there a REAL TRUTH about morality? (An intractable "external" question)Empirically, what moral values would humans accept if we each went through a deliberative process of encountering lots of arguments and updating our moral values in the direction that aligns with our (broadly construed) theoretical preferences? (A tractable "internal" question)
To (1), the moral realist answers "yes" and the moral antirealist says "I don't know what you're talking about, can you please make this clearer?"
But (1) is practically inert. It doesn't make a difference to what practical actions we take, just whether we baptize them with capital letters.
(2) is empirically tractable, practically relevant, and something that realists and antirealists could in principle agree upon. For example:
- An antirealist Kantian and a naturalist realist could agree that all humans will converge on the same moral ideals at the end of a certain kind of reflection, but disagree about whether this is the REAL TRUTH.An antirealist Humean and a non-naturalist realist could agree that, empirically, human values are way too incommensurable to achieve any kind of convergence, but the realist thinks that one set is still REALLY TRUE and the Humean disagrees.
If we set aside the external question, we can arrive at a set of value-neutral, philosophically ecumenical empirical hypotheses about moral epistemology that allow us to make tractable progress on what we should align AGI to without having to make any progress on the realism vs antirealism debate whatsoever. You can just Taboo Your Words.
Moral epistemology naturalized
Here are some empirical psychological hypotheses we could consider, building on one another:
MORAL REASONING IS REAL: Are humans such that we can be brought through a series of arguments and endorse all of the premises in the arguments and end up with radically different moral views from where we started from that they are satisfied with?
If you think that, empirically, humans tend to intrinsically value things in state space, then you'll think no. If you think that, empirically, humans tend to intrinsically value deliberative processes, you'll think yes.
FAULTLESS CONVERGENCE IS POSSIBLE: Could we find such a series of arguments that is convincing to everyone, such that we all arrive in the same place?
If you think that, empirically, we all share enough of the same values about deliberative processes, then you'll think yes. If you think that at least some of us don't share those values about deliberative processes, you'll think no.
UNIQUENESS: Is there one unique such series of arguments? (As opposed to the view that there is at least one series of arguments we would all agree on, but also other series of arguments that we would happily accept but would make us diverge — a kind of non-uniqueness thesis.)
If you think that the arguments humans would find acceptable would, empirically, have only one direction and not have multiple stable equilibria, then you'll accept Uniqueness. Otherwise you'll accept Non-Uniqueness.
SEMI-UNIQUENESS: If non-uniqueness, is there a unique series of arguments that would maximally satisfy everyone's preferences over theoretical choices, broadly construed?
If you think that there are multiple stable equilibria for human moral reasoning, some of these paths have a higher degree of theoretical preference satisfaction, and one of these paths has the highest degree of theoretical preference satisfaction for everyone, you'll accept Semi-Uniqueness. (This is fairly value-laden, and we'd need to be more precise about what "theoretical preference satisfacation" amounts to and how to aggregate it if we wanted to make this empirically tractable.)
UNIFICATION: Can this set of arguments be described coherently as a unified "process" that we could understand and write down, or is it merely an incoherent hodge podge of ideas?
This is again fairly value-laden, and we'd need to be more precise to make this an empirically tractable question.
I'm not necessarily saying that these are highly tractable questions, but (made suitably precise) they are questions that have empirical answers we could find out with a sufficiently advanced study of empirical psychology, and they are the kind of hypotheses that we can update on based on empirical data, unlike realism and antirealism. This also makes them the kinds of questions we know how to solve, and could solve with the help of AI, unlike the external question whether morality is REALLY TRUE.
Implications
Depending on the choice points you take, you'll adopt different views on what we exact process we should align AGI to. For example:
- If you reject Moral Reasoning is Real, so you think people value things rather than deliberative processes, and you also think people value different things, and you also want to cooperate with other humans, then you'll want to find an alignment procedure that gets all humans as much of what they want as possible. Something like an idealized form of quadratic voting that fully captures the cardinality in people's moral preferences.If you accept Moral Reasoning is Real, but reject all of the other principles, then you'll at least want deliberative processes to be part of the thing we align AGI to. The best version of this might be an idealized form of quadratic voting that is over not only states of the world but also ways to reason about value. Or perhaps you'll want to use AGI to help people speed-run their own moral reasoning processes before doing (1) above, depending on which is more tractable.If you accept that Moral Reasoning is Real and Faultless Convergence is Possible but reject the other claims, then, depending on the other details, you might think that we should just go ahead and converge to one of the sets of moral rules that we'd all happily converge to, since this would massively ease coordination. Or if this would lead to too much loss in people's theoretical preference loss landscape, you might go back to (2).If you accept everything above (including either Uniqueness or Semi-Uniqueness), you'll want to use AGI to find the one unique deliberative process that all of us are going to find we like the best.If you accept everything above (including either Uniqueness or Semi-Uniqueness) but you reject Unification, then you'll want to do (4), but you'll think this is a much messier and more complicated process than finding a single well-described way of reasoning about morality, which will have implications for how you go about it.
How to reject my view
The main reason I could see someone rejecting my view goes as follows: "man, moral epistemology is so deeply pre-paradigmatic that all we can do is wander around in the wilderness until we figure out what's going on; it's like if medieval peasants tried to think about physics from the armchair!"
If you have that view, then you're not going to want to hang your hat on a particular way of carving up the moral-epistemic landscape and you'll probably be skeptical of my attempt to naturalize the question. But this is still a broadly empirical view that is susceptible to broadly empirical evidence from the history of science and successful theorizing — just look at the analogy you made!
Moreover, even with a view this despairing, there may well still be tractable, action-guiding implications for what you should align AGI to! If you think morality is this pre-paradigmatic, you might worry that we use AGI to lock in the wrong reasoning processes before we have the requisite million years to wander around in the wilderness and gaze long into the Abyss and stumble around in darkness for the right way to answer the question. And then perhaps your best bet is to try to create emulations of all human brains and run them very quickly to speed-run the trip through the desert. That's what I would do if I was this confused, and it seems like a very robust process regardless of your views on all of the above.
Discuss