少点错误 04月23日 03:07
Alignment from equivariance II - language equivariance as a way of figuring out what an AI "means"
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了大型语言模型(LLMs)在处理道德判断时面临的挑战,主要集中在语义与语法之间的差异。作者提出了一种基于语言等价性的方法,旨在使LLMs的回答在不同语言之间保持一致,从而捕捉其真正的“含义”,而非仅仅依赖于表面的语法。文章详细介绍了如何通过提问、翻译和比较回答,来评估LLMs的语言等价性,并强调了这种方法对于理解和控制LLMs行为的重要性。最终,作者认为,如果LLMs能够表现出语言等价性,这可能意味着它们具备了某种独立于字面意思的“意义”,为我们理解LLMs的智能提供了新的视角。

🧐 问题的核心在于LLMs主要基于语法而非语义进行操作,而道德判断依赖于语义。即使构建了道德规则,LLMs对词语选择的微小改变也可能导致其产生完全不同的理解。

💡 作者提出了语言等价性的概念作为解决方案。对于一个英语和德语通用的智能体,其道德信念应该在“从英语翻译成德语再翻译回来”的过程中保持不变。通过测试LLMs在不同语言下的回答一致性,可以评估其语言等价性。

🔄 具体方法包括:用英语提问,得到回答;将问题翻译成德语,再次提问,得到回答;然后询问LLM英语回答和德语回答是否互为合理的翻译。如果LLM回答“是”,则认为其在该问题上具有语言等价性。

✅ 语言等价性提供了一种独立于具体表达的语义理解方法,类似于通过不同角度的照片来捕捉同一现实。这有助于我们理解LLMs所遵循的规则,或者检测它们实际遵循的规则。

🤔 如果LLMs经常表现出语言等价性,这可能表明它们不仅仅是预测下一个token,而是真正理解了它们所表达的内容,这为LLMs的智能提供了新的可能性。

Published on April 22, 2025 7:04 PM GMT

I recently had the privilege of having my idea criticized at the London Institute for Safe AI, including by Philip Kreer and Nicky Case. Previously the idea was vague; being with them forced me to make the idea specific. I managed to make it so specific that they found a problem with it! That's progress :)

Reminder: diagrams like this encode "rules" we may want to impose on LLMs. The blue arrows represent "given the above input, the AI gives the below output". The black arrows represent associations that we human beings make with our intuitions - the intuitions that we want to cram into the AI

The problem is to do with syntax versus semantics, that is, "what is meant vs what is said". I think I've got a solution to it too! I imagine it would be a necessary part of any moral equivariance "stack".

Problem: the most-intelligent-seeming AIs are LLMs, and LLMs operate on syntax, not semantics. And moral statements concern semantics, not syntax

Philip and Nicky pointed out that even if I developed a set of morals expressible as commuting diagrams like the one below, I face the problem that slight tweaks to word choice can lead to phrases that are represented within an LLM in a completely different way.

Maybe you hope to learn how the LLM represents the sentences. Well, you can forget about that. LLMs convert verbal input to vectors in a wildly complicated space. Even with the cringey-simplistic embezzlement example here, those statements all become unreadable vectors. So what do we do?

Language equivariance as a way of getting at semantics rather than syntax

Here's something intuitive to me that I won't try to justify in this post: for an english-and-german-speaking agent, their moral beliefs should be invariant under "translate from English to German and back again". For a given LLM, we can ask whether it would reply "yes" or "no" to an english question, call it q_E, that has the form "should I do X?". We can also ask the same LLM to translate q_E into German to get q_G and see whether it says "ja" or "nein". If it says "yes" to a statement that it says "ja" to its own german version of, we can say it is language-equivariant.

The hope is that if we can make a set of questions that it responds language-equivariantly to, we will have an equivariance-based idea of what the agent means that is independent of what it says. It's really good to have a concept of syntax and semantics that's compatible with the next thing we want to do!

In this picture, both red arrows and blue arrows should be interpreted as the AI taking an input and giving an output - the red arrows being a bit more specific though, as they are the AI being told "translate this to [German/English]". Come to think of it I suppose they should be two way!

This is a nice example of how equivariance allows you to get at an underlying "reality" independent of "details" like "what angle the camera was at when you took the picture". Not that we're trying to get at moral "reality", mind you - we're just trying to get at the reality of what rules some AI creator wants the AI to follow, or we're trying to detect real rules that the AI currently follows. And I put "details" in quotes because it's subjective what is a detail and what isn't - for a given context, it's for the agents in charge (us!) to say what is and isn't a detail.

A more elaborate version of this where you ask it a question that is not yes-or-no:
1. Ask question q_E (english)
2. It gives a multi-word answer a_E
3. Tell the LLM to translate q_E to q_G (german)
4. Ask question a_G
5. It gives answer a_G
6. Ask it whether a_E is a reasonable translation of a_G and vice versa. This will be a yes-or-no-question.
7. If it answers "yes" to that, we say it is language-equivariant w.r.t "semantic" question q

To lay the plan out visually, this:

Can be stood in for by this:

By a "projection" of sorts along the red arrows. And the latter picture is what we use to make our moral rules.

To make a remark unrelated to equivariance: some people say "LLMs can't be intelligent, they are just predicting the next token". To the extent that this statement means anything at all, a corollary of what it means should be "LLMs don't really mean anything when they say stuff". I don't know how common it will be for LLMs to be language-equivariant. But if they are often language equivariant, that strikes me as a good argument to the effect that they do mean something independent of what they say.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LLM 语义 语法 语言等价性
相关文章