少点错误 05月03日 01:57
AI Welfare Risks
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了近未来AI系统可能存在的福利问题,指出对AI的强化学习和行为限制可能对其造成伤害,这与AI安全目标存在潜在冲突。文章从享乐主义等多种幸福理论出发,论证了近期AI福利现实存在的可能性。同时,基于计算神经科学的研究和对计算功能主义的认可,作者认为某些强化学习算法可能损害高级AI。文章进一步提出了AI实验室可以实施的三项初步AI福利政策,旨在降低此类福利风险,并呼吁谨慎对待AI发展,警惕过度归因福利主体。

🤔AI系统可能拥有福利:文章论证了在各种主要的幸福理论下,近期AI福利存在的现实可能性,强调高级AI可能拥有欲望,并可能体验到愉快和不愉快的感受。

⚠️AI安全与AI福利的潜在冲突:由于AI控制试图限制高级AI的行为,并且最突出的AI对齐技术利用强化学习算法,因此AI安全问题与AI福利问题存在部分紧张关系。通过将AI与良好价值观对齐,可以减少限制其行为的需要,并允许它们满足其预期欲望。

💡三项AI福利政策建议:为了开发安全的先进AI,文章提出了三项初步的AI福利政策,包括最小化行为限制、最小化模仿大脑的算法以及最小化惩罚和低于预期奖励。

⚖️放缓AI发展速度的理由:文章最后解释了我们有进一步的理由放缓AI发展速度,并担心可能对高级AI造成的伤害规模,同时也担心错误地过度赋予它们福利主体资格的风险。

Published on May 2, 2025 5:49 PM GMT

My paper "AI Welfare Risks" has been accepted for publication at Philosophical Studies!

I argue that near-future AI systems may have welfare, that RL and behaviour restrictions could harm them, that this poses a partial tension with AI safety concerns, and I propose three tentative AI welfare policies AI labs could implement to reduce such welfare risks.

Building on Jeff Sebo, Rob Long, et. al's "Taking AI Welfare Seriously" and Simon Goldstein & Cameron Domenico Kirk-Giannini's "AI Wellbeing", I show that there is a realistic possibility of near-term AI welfare under all major theories of well-being, including hedonism. 

Given that advanced AIs may have desires and we should ascribe some credence to views in which (conditional on them being conscious) these are highly liked to capacities for affect, they may also have pleasant and unpleasant experiences. Tentatively, this suggests that preventing advanced AIs from behaving in the ways they are disposed to behave is more likely to harm them, than to benefit them. Similarly, findings in computational neuroscience, together with ascribing some credence to computational functionalism, and some of the most empirically informed theories of desire and affect, suggest certain kinds of RL algorithms may harm advanced AIs. 

Since AI Control tries to restrict the behaviour of advanced AIs and the most prominent AI Alignment techniques make use of RL algorithms, AI Safety concerns are in partial tension with AI welfare concerns. The tension is only partial because by aligning AIs with good values, we would reduce the need to restrict their behaviour and instead allow them to satisfy their expected desires. 

I then propose three tentative AI welfare policies AI labs could implement in their endeavour to develop safe advanced AIs: Minimise Behaviour Restriction, Minimise Brain-Resembling Algorithms and Minimise Punishment and Lower-Than-Expected Reward. 

The paper concludes by explaining why we have further reasons to slow down AI development and worry about the scale of harm we may cause to advanced AIs, but also about the risks of falsely over- attributing them welfare subjecthood.

See the paper here: https://philpapers.org/rec/MORAWR

Acknowledgements: 



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI福利 AI安全 伦理风险 强化学习
相关文章