Published on May 2, 2025 5:49 PM GMT
My paper "AI Welfare Risks" has been accepted for publication at Philosophical Studies!
I argue that near-future AI systems may have welfare, that RL and behaviour restrictions could harm them, that this poses a partial tension with AI safety concerns, and I propose three tentative AI welfare policies AI labs could implement to reduce such welfare risks.
Building on Jeff Sebo, Rob Long, et. al's "Taking AI Welfare Seriously" and Simon Goldstein & Cameron Domenico Kirk-Giannini's "AI Wellbeing", I show that there is a realistic possibility of near-term AI welfare under all major theories of well-being, including hedonism.
Given that advanced AIs may have desires and we should ascribe some credence to views in which (conditional on them being conscious) these are highly liked to capacities for affect, they may also have pleasant and unpleasant experiences. Tentatively, this suggests that preventing advanced AIs from behaving in the ways they are disposed to behave is more likely to harm them, than to benefit them. Similarly, findings in computational neuroscience, together with ascribing some credence to computational functionalism, and some of the most empirically informed theories of desire and affect, suggest certain kinds of RL algorithms may harm advanced AIs.
Since AI Control tries to restrict the behaviour of advanced AIs and the most prominent AI Alignment techniques make use of RL algorithms, AI Safety concerns are in partial tension with AI welfare concerns. The tension is only partial because by aligning AIs with good values, we would reduce the need to restrict their behaviour and instead allow them to satisfy their expected desires.
I then propose three tentative AI welfare policies AI labs could implement in their endeavour to develop safe advanced AIs: Minimise Behaviour Restriction, Minimise Brain-Resembling Algorithms and Minimise Punishment and Lower-Than-Expected Reward.
The paper concludes by explaining why we have further reasons to slow down AI development and worry about the scale of harm we may cause to advanced AIs, but also about the risks of falsely over- attributing them welfare subjecthood.
See the paper here: https://philpapers.org/rec/MORAWR
Acknowledgements:
- For comments on previous drafts of this paper, I am especially grateful to Bradford Saad, Eze Paez, Jeff Sebo, Oscar Horta, Pablo Magaña, Aluenda Smeeton, Martín Soto, and Jose Tarín, as well as to audiences at the University of Oxford’s Global Priorities Institute AI work-in-progress workshop and the Law and Philosophy Colloquium at Pompeu Fabra University. Finally, I would also like to thank two anonymous reviewers from Philosophical Studies.
Discuss