Published on July 30, 2025 7:15 PM GMT
This paper of mine ("A Timing Problem for Instrumental Convergence"), co-authored with Helena Ward and Jen Semler, was recently accepted in Philosophical Studies for a superintelligent robots issue (open access). The paper argues that instrumental rationality doesn't require goal preservation/goal-content integrity/goal stability. Here is the abstract:
Those who worry about a superintelligent AI destroying humanity often appeal to the instrumental convergence thesis—the claim that even if we don’t know what a superintelligence’s ultimate goals will be, we can expect it to pursue various instrumental goals which are useful for achieving most ends. In this paper, we argue that one of these proposed goals is mistaken. We argue that instrumental goal preservation—the claim that a rational agent will tend to preserve its goals because that makes it better at achieving its goals—is false on the basis of the timing problem: an agent which abandons or otherwise changes its goal does not thereby fail to take a required means for achieving a goal it has. Our argument draws on the distinction between means-rationality (adopting suitable means to achieve an end) and ends-rationality (choosing one’s ends based on reasons). Because proponents of the instrumental convergence thesis are concerned with means-rationality, we argue, they cannot avoid the timing problem. After defending our argument against several objections, we conclude by considering the implications our argument has for the rest of the instrumental convergence thesis and for AI safety more generally.
Discuss