Has Eliezer publicly and satisfactorily responded to attempted rebuttals of the analogy to evolution?

少点错误 2024年07月28日

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

文章讨论了人工智能训练过程中，SGD 优化器是否能避免模型出现“急转弯”的风险。作者认为，由于 SGD 在优化过程中能够对模型参数进行微调，因此模型不太可能出现类似于人类进化过程中出现的“急转弯”。作者认为，如果人类能够利用这些模型的输出来解决潜在的“急转弯”风险，人类或许能够生存下来。

🤔 文章认为，SGD 优化器在训练 AI 模型时，通过微调模型参数，可以避免模型出现“急转弯”的风险。这与人类进化过程中的“急转弯”有所不同，因为人类进化过程中，自然选择作为优化器，无法直接对每个神经元连接进行微调。

💪 作者认为，SGD 优化器在优化模型参数方面比自然选择强大得多，因此 AI 模型的“急转弯”可能只会在模型能力远超人类时才会出现。

💡 文章认为，人类可以通过利用 AI 模型的输出，来解决潜在的“急转弯”风险，并最终实现人类的生存。

⚠️ 文章还指出，如果人类在 AI 模型训练过程中使用其他训练方法，或者滥用模型，则可能会导致“急转弯”的发生。

Published on July 28, 2024 12:23 PM GMT

I refer to these posts:

https://optimists.ai/2023/11/28/ai-is-easy-to-control/

https://www.lesswrong.com/posts/hvz9qjWyv8cLX9JJR/evolution-provides-no-evidence-for-the-sharp-left-turn

https://www.lesswrong.com/posts/CoZhXrhpQxpy9xw9y/where-i-agree-and-disagree-with-eliezer

My (poor, maybe mis-) understanding is that the argument is that as SGD optimizes for "predicting the next token" and we select for systems with very low loss by modifying every single parameter in the neural network (which basically defines the network itself), it seems quite unlikely that we'll have a "sharp left turn" in the near term, which happened because evolution was too weak an outer optimizer to fully "control" humans' thinking in the direction that most improved inclusive genetic fitness, as it is too weak to directly tinker every neuron connection in our brain.

Given SGD's vastly stronger ability at outer optimisation of every parameter, isn't it possible, if not likely, that any sharp left turn occurs only at a vastly superhuman level, if the inner optimizer becomes vastly stronger than SGD?

The above arguments have persuaded me that we might be able to thread the needle for survival if humanity is able to use the not-yet-actively-deceptive outputs of moderately-superhuman models (because they are still just predicting the next token to the best of their capability), to help us solve the potential sharp left turn and if humanity doesn't do anything else stupid with other training methods/misuse and manages to solve the other problems. Of course, in an ideal world we wouldn't be in this situation.

I have read some rebuttals by others on LessWrong but did not find anything that convincingly debunked this idea (maybe I missed something).

Did Eliezer, or anyone else, ever tell us why this is wrong (if it is)? I have been searching for the past week but have only found this: https://x.com/ESYudkowsky/status/1726329895121514565 which seemed to be switching to more of a post-training discussion.

Discuss

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签