AISafety.info: What are Inductive Biases

少点错误 2024年09月20日

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

本文探讨了学习过程中的归纳偏差，并解释了它与人工智能安全和对齐的重要性。文章指出，为了确保人工智能系统符合人类价值观，我们需要了解哪些学习过程具有与人类认知和价值观相似的归纳偏差。文章以预测能量水平为例，说明了学习过程的归纳偏差如何影响模型的泛化能力。

🤔 **归纳偏差：学习过程的内在倾向** 学习过程的归纳偏差是指它在从数据中学习特定模式时的倾向。为了实现人工智能对齐，我们需要了解哪些学习过程具有与人类认知和价值观相似的归纳偏差。例如，如果我们想预测一个人的能量水平，我们可以记录一段时间内他们的能量水平数据。然后，我们可以使用一个学习过程来拟合这些数据，例如通过绘制数据点之间的直线。然而，这种学习过程可能存在问题，因为它可能无法很好地泛化到训练数据之外。这种学习过程的一个问题是，它倾向于学习线性函数，而能量水平通常是周期性的。这意味着，即使我们提供了大量的数据，这种学习过程也可能无法很好地预测能量水平的变化。因此，我们需要选择具有与人类认知和价值观相似的归纳偏差的学习过程，以确保人工智能系统能够学习到人类期望的行为。

🤖 **人工智能安全与归纳偏差** 了解人工智能系统的归纳偏差对于人工智能安全至关重要。如果一个人工智能系统具有与人类价值观不一致的归纳偏差，那么它可能会做出与人类期望不符的行为。例如，如果一个人工智能系统具有倾向于学习低频函数的归纳偏差，那么它可能难以学习到涉及行为突然转变的模式。这可能会导致人工智能系统在遇到超出其训练数据范围的情况时，做出错误的判断或采取不安全的行动。因此，我们需要研究人工智能系统的归纳偏差，并确保它们与人类价值观相一致。这将有助于我们构建更安全、更可靠的人工智能系统。

💡 **深度学习与归纳偏差** 深度学习是一种广泛应用于人工智能领域的技术。深度学习模型通常具有倾向于学习低频函数的归纳偏差。这可能是深度学习模型在某些任务上表现出色的原因，例如图像识别。然而，这种归纳偏差也可能导致深度学习模型难以学习到涉及高频行为的模式。例如，如果一个深度学习模型被用来预测股票价格，那么它可能无法很好地预测股票价格的突然波动。因此，我们需要研究深度学习模型的归纳偏差，并探索如何调整这些模型，以使其能够更好地学习到高频行为。这将有助于我们构建更强大的深度学习模型，能够在各种任务上表现出色。

Published on September 19, 2024 5:26 PM GMT

AISafety.info writes AI safety intro content. We'd appreciate any feedback.

The inductive bias of a learning process is its tendency to learn particular kinds of patterns from data. It would help with AI alignment to know which learning processes have inductive biases toward patterns used in human cognition, and in particular human values.

In a learning process L, you have some class of models in which you look for an approximation M to some finite set of data D. This data is generated by some function f, which usually you don’t know. You want M to replicate the target function f as closely as possible. Since you can almost never get enough data to fully infer f, you have to hope that your learning process L will find an approximation M that is close to f even far from the training data. If there’s a class of target functions that L can approximate well, we say that L has an inductive bias toward such functions.

Example: Let’s say you want to predict how energetic you will feel over time. That’s your target function. So for a couple of days, whenever you get the chance, you log how energetic you feel (that’s your data D). And because you’re lazy, you decide to just draw straight lines between the data points. At the edges, you just keep drawing the outermost lines you got from D. That’s your learning process L, which will spit out some piecewise linear curve M. This is a poor learning process, for reasons we’ll get to in a bit.

You run the experiment, and then plot the results in Figure 1. The resulting gray line isn’t that bad an approximation — at least, not near existing data points. And if we have dense enough data within some range, the approximation therein will be arbitrarily good.

Far from our data points, our approximation M will fall indefinitely. This does not match our everyday experience, in which our energy fluctuates. This is a fundamental issue with this choice of L: it will never globally approximate a periodic function well^[1], no matter how much data you give it. Almost all the functions that L can learn will keep growing or falling far outside the training data. That is an inductive bias of this learning process.^[2]

Why does any of this matter for AI or AI alignment? Because we want our AI to have an inductive bias towards human values when learning what humans care about. Not only do we want it to fit the training data well, we also don’t want it to go crazy outside the data distribution, because ASI would likely experience distributional shift.^[3]

We don’t know what the inductive biases of an ASI would be, and in particular if its inductive biases would match those of humans. But we can think about the implications of possible initial architectures for ASI. For instance, if ASI emerges from the deep learning paradigm, then it will probably start off with the inductive biases common to neural networks. For many architectures, we know that neural networks are biased towards low-frequency functions, even at initialization.

If we can determine if a given behavior is low-frequency, then we can have some confidence that neural networks will fail to learn it by default. Indeed, Nora Belrose and Quintin Pope have argued that a treacherous turn is a high-frequency behavior as it involves a phase shift in behavior from the training distribution to the test distribution. This safety argument is an example of the sort of reasoning we could do if we knew the inductive biases of AI models.

^{^}
Barring constant functions
^{^}
We can easily see that L has the wrong bias, and think of ways to create a periodic function from the approximation by stitching together infinitely many copies of it. But that’s us intervening in L, so it doesn’t affect L’s inductive bias.
^{^}
This is true even if only because an ASI would certainly have the power to cause the world to change drastically.

Discuss

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签