Searching for Impossibility Results or No-Go Theorems for provable safety.

少点错误 2024年09月28日

文章寻求关于可证明安全性的特定结果，以指导相关研究。提到已有Yampolskiy的论文，但还需更具针对性的成果，也对非平凡玩具模型中的不可能性结果及其他相关有趣信息感兴趣。

🎯文章旨在寻找可证明安全性的特定结果，如各种证明其不可行或属于特定复杂类别的方法，以引导研究方向。提到Yampolskiy的论文，但认为其不够有针对性，需要更具实际指导意义的成果。

🧐许多关于可证明安全性的结果来自计算理论，过于笼统而实用性不足。例如，一个不能正式验证所有可能数学证明的定理，对哪些受限系统可被验证的说明作用有限。

🤔作者对非平凡玩具模型（如RL环境）中对齐问题的不可能性结果感兴趣，这些结果不应只是更一般定理的推论，而应具有独特的价值。

📚最后，作者希望获得任何其他可能让人觉得有趣且总体相关的参考资料或信息，以进一步丰富对可证明安全性的研究。

Published on September 27, 2024 8:12 PM GMT

I am looking for results showing that various approaches to provable safety are impossible or that such proofs are of a particular complexity class. I have Yampolskiy's paper "Impossibility Results in AI: A Survey," but I am looking for more targeted results that would help guide research into provable safety. Many of the results seem to be from the Computability theory and are so general that they are not that useful.

A theorem stating that one cannot formally verify all possible mathematical proofs does little to say about which constrained system can be verified.

I would also be interested in impossibility results in non-trivial toy models of alignment problems (RL environments) that are not simply the corollary of the much more general theorems.

Lastly, given everything written above, I would also like any other reference/information that a person may reasonably expect me to find interesting and generally related.

Discuss

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签