Published on June 6, 2025 7:08 PM GMT
The other day I talked to a friend of mine about AI x-risk. My friend is a fan of Elon Musk, and argued roughly that x-risk was not a problem, because xAI will win the AI race and deploy a "maximally curious" superintelligence.
Setting aside the issue of whether xAI will win the race, it seemed obvious to me that this is not a useful way of going about alignment. Furthermore, I believe that the suggestion of maximal curiousity is actively harmful.
What Does 'Curiousity' Mean?
The most common reason to be interested in truth is that it allows you to make useful predictions. In this case, truths are valuable instrumentally based on how you can apply them to achieving your true goals.
This is not the only reason to value truth. A child who loves dinosaurs does not learn about them because he expects to apply the knowledge to his daily life. He is interested regardless of whether he expects to use what he learns. This is curiousity: the human phenomenon of valuing truth intrinsically.
However, humans are not equally curious about all truths. The child is much more interested in dinosaur facts than he is in his history homework. Someone else might find dinosaurs dull but history fascinating. Curiousity is determined by a subjective preference for some truths over others.
What subjective sense of curiousity should we give a superintelligence? No simple formalism will cut it, because humans curiousity is fundamentally tied to human values. We are far more curious about what Arthur said to Bob about Charlie than we are about the number of valence electrons of carbon, despite the latter being a vastly more important and fundamental fact.
To make a maximally curious AI, for the conventional meaning of 'curious', we need to somehow
- reconcile the conflicting preferences for some ideas over others of the many humans around the world,[1]reify those preferences into an explicit training objective, andensure that the resulting AI actually agrees with those preferences.
Sound familiar? It should, because replacing 'ideas' with 'outcomes' yields a description of the alignment problem itself. It is no easier to capture the messy human concept of curiousity than it is to capture the messy human concept of good.
True Curiousity is Not Good Enough
Even if we could solve the curiousity alignment problem, this does not solve alignment. Humans have many values besides curiousity, and a maximally curious AI will not reflect those values.
If fiction is any indicator, humans are far more curious about conflict than they are in harmony. A maximally curious ASI will not end poverty, because it will be curious about what people do when driven to extremes. It will stop aging for only some people, because it will be interested in the dynamics of a society with an immortal elite. It will not end disease, because it will be curious about how people cope. Or perhaps it will simply be curious about the effects of different methods of torture.
Maximal Curiousity as Semantic Stopsign
Maximal curiousity is essentialy as difficult to attain as value-alignment, and even achieving it does not result in a good outcome for humanity. Maximal curiousity is, therefore, not useful as a technique for alignment.
Instead, it serves as a semantic stopsign to divert questions on xAI's alignment efforts (or lack thereof). "We'll just solve alignment by making it curious" is not an answer, and the follow up questions should be "How does curiousity solve alignment?" and "How will you make it curious?" Neither of these questions have good answers.
- ^
Though in xAI's case I suppose they would be likely to just use Elon Musk's particular set of preferences.
Discuss