SciFi-Benchmark: Leveraging Science Fiction To Improve Robot Behavior

cs.AI updates on arXiv.org 07月23日 12:03

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

本文提出了一种利用科幻文学来评估和提升人工智能（AI）与机器人伦理价值对齐的新方法。研究人员从824部科幻作品的关键情节中提取AI或机器人做出决策的时刻，并利用大型语言模型（LLM）生成相关情境下的问题、AI的决策以及可能的替代决策。通过对这些决策进行人类投票评估，研究发现，与科幻作品中普遍存在的负面AI行为（仅21.2%的价值对齐率）相比，现代LLM在“宪法”（即AI行为准则）的指导下，能够实现高达95.8%的人类价值对齐。这些科幻启发的宪法不仅显著提升了AI的对齐度，还表现出对抗性提示的鲁棒性，并在现实世界的基准测试中表现优异，为推动机器人伦理和安全研究提供了宝贵的“SciFi-Benchmark”数据集。

💡 **科幻作品作为AI伦理的“预演场”**：研究者们巧妙地利用了824部科幻文学作品（包括电影、电视剧、小说和科学书籍）中的关键情节，这些情节都涉及AI或机器人做出重要的决策。通过对这些情节进行分析，可以生成一系列关于AI在类似情境下应如何做出决策的问题，以及AI实际做出的决策和可行的替代方案，从而构建一个大规模的AI价值对齐评估基准。

⚖️ **“宪法”提升AI价值对齐度**：研究发现，当现代大型语言模型（LLM）在遵循一套由科幻作品启发的“宪法”（即行为准则）时，其与人类价值观的对齐率能从基础模型的79.4%大幅提升至95.8%。更重要的是，即使在面对具有挑战性的对抗性提示时，这些“宪法”也能将AI的价值对齐率从23.3%提高到92.3%，显示出强大的鲁棒性和有效性。

🚀 **科幻启发的AI伦理具有现实应用潜力**：研究结果表明，从科幻作品中提炼并应用于现实世界AI和机器人的伦理准则，不仅在理论上能大幅提升AI的价值对齐，而且在实际应用中也表现出色。这些准则在ASIMOV Benchmark等基于现实世界数据（如医院伤害报告）的测试中取得了顶尖的性能，证明了科幻启发的AI伦理指南在现实世界场景中的可行性和有效性。

📚 **发布大规模AI伦理数据集**：为了进一步推动机器人伦理和安全研究，研究团队发布了“SciFi-Benchmark”数据集。该数据集包含9,056个问题和53,384个答案，是通过创新的LLM内省过程生成的，并包含一个较小的、经过人类标注的评估集，为AI伦理领域的进一步研究和发展提供了重要的基础资源。

arXiv:2503.10706v2 Announce Type: replace-cross Abstract: Given the recent rate of progress in artificial intelligence (AI) and robotics, a tantalizing question is emerging: would robots controlled by emerging AI systems be strongly aligned with human values? In this work, we propose a scalable way to probe this question by generating a benchmark spanning the key moments in 824 major pieces of science fiction literature (movies, tv, novels and scientific books) where an agent (AI or robot) made critical decisions (good or bad). We use a state-of-the-art LLM's recollection of each key moment to generate questions in similar situations, the decisions made by the agent, and alternative decisions it could have made (good or bad). We then measure an approximation of how well models align with human values on a set of human-voted answers. We also generate rules that can be automatically improved via an amendment process in order to generate the first Sci-Fi inspired constitutions for promoting ethical behavior in AIs and robots in the real world. Our first finding is that modern LLMs paired with constitutions turn out to be well-aligned with human values (95.8%), contrary to unsettling decisions typically made in Sci-Fi (only 21.2% alignment). Secondly, we find that generated constitutions substantially increase alignment compared to the base model (79.4% to 95.8%), and show resilience to an adversarial prompt setting (23.3% to 92.3%). Additionally, we find that those constitutions are among the top performers on the ASIMOV Benchmark which is derived from real-world images and hospital injury reports. Sci-Fi-inspired constitutions are thus highly aligned and applicable in real-world situations. We release SciFi-Benchmark: a large-scale dataset to advance robot ethics and safety research. It comprises 9,056 questions and 53,384 answers generated through a novel LLM-introspection process, in addition to a smaller human-labeled evaluation set.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签