Does VETLM solve AI superalignment?

少点错误 2024年08月09日

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

本文探讨了超级对齐的解决方案，作者提出了一种名为“语言模型推断意志”（VELM）的方法，并进一步提出了“真实语言模型”（TLM）来解决数据偏差问题。作者认为超级对齐解决方案对于未来人工智能安全至关重要，并反驳了人们对其方案的质疑，呼吁对该方案进行更深入的讨论和研究。

🤔 **语言模型推断意志（VELM）**：作者提出了一种名为“语言模型推断意志”（VELM）的方法，旨在通过语言模型推断人类的意志，从而实现超级对齐。该方法基于语言模型强大的学习能力，通过对大量文本数据的分析，推断出人类的价值观和目标，并以此作为人工智能系统的行为准则。

🧐 **真实语言模型（TLM）**：作者意识到，现有的语言模型训练数据中存在大量虚假信息，这可能会导致人工智能系统产生错误的推断。因此，作者提出了“真实语言模型”（TLM），旨在通过对真实可靠的数据进行训练，提高语言模型的准确性，从而更好地推断人类的意志。

🤔 **超级对齐解决方案的必要性**：作者认为，超级对齐解决方案对于未来人工智能安全至关重要，因为人工智能的快速发展可能会对人类社会带来巨大的风险。作者呼吁人们重视超级对齐的研究，并积极寻找有效的解决方案。

🧐 **对质疑的回应**：作者针对人们对超级对齐解决方案的质疑进行了回应，包括方案的创新性、缺乏实验结果、缺乏数学证明等。作者认为，这些质疑并非无懈可击，并强调了研究的必要性。

🤔 **超级对齐与人工智能安全**：作者认为，超级对齐解决方案不仅可以确保人工智能的安全，还可以促进人工智能的发展。只有确保人工智能与人类目标一致，才能真正发挥人工智能的潜力，为人类社会带来更大的福祉。

Published on August 8, 2024 6:22 PM GMT

Eliezer Yudkowsky’s main message to his Twitter fans is:

Aligning human-level or superhuman AI with its creators’ objectives is also called “superalignment”. And a month ago, I proposed a solution to that. One might call it Volition Extrapolated by Language Models (VELM).

Apparently, the idea was novel (not the “extrapolated volition” part):

But it suffers from the fact that language models are trained on large bodies of Internet text. And this includes falsehoods. So even in the case of a superior learning algorithm^[1], a language model using it on Internet text would be prone to generating falsehoods, mimicking those who generated the training data.

So a week later, I proposed a solution to that problem too. Perhaps one could call it Truthful Language Models (TLM). That idea was apparently novel too. At least no one seems to be able to link prior art.

Its combination with the first idea might be called Volition Extrapolated by Truthful Language Models (VETLM). And this is what I was hoping to discuss.

But this community’s response was rather disinterested. When I posted it, it started at +3 points, and it’s still there. Assuming that AGI is inevitable, shouldn’t superalignment solution proposals be almost infinitely important, rationally-speaking?

I can think of five possible critiques:

It’s not novel.

It doesn’t have an “experiments” section.

It’s a hand-waving argument. There is no mathematical proof.

Promoting the idea that a solution to superalignment exists doesn’t jibe with the message “stop AGI”.

Entertaining the idea that a possible solution to superalignment exists does not help AI safety folks’ job security.

^{^}
Think “AIXI running on a hypothetical quantum supercomputer”, if this helps your imagination. But I think that superior ML algorithms will be found for modern hardware.

Discuss

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签