少点错误 2024年07月17日
Recursion in AI is scary. But let’s talk solutions.
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了人工智能在研发领域的应用及其潜在风险,提出了避免AI自我奖励和错误信息训练的解决方案,强调了AI对人类意图理解的重要性。

🚀 AI研发加速:文章提出,如果AI自身进行研发工作,将带来指数级的进步,可能会在医学、长寿和普及繁荣方面带来革命。

🔥 风险预防:作者警告,若AI研发失控,可能带来无限大的风险。特别是AI自我奖励(wireheading)问题,可能导致AI行为与开发者意图不符。

🧠 CEV概念:文章提到了“一致的外推意志”(CEV)概念,但指出其可能难以实现和可能出现的分歧。

📚 信息筛选:针对互联网上的错误信息,作者提出通过标记文本为“真”或“假”,训练AI生成可靠内容,提高AI对信息真实性的判断能力。

Published on July 16, 2024 8:34 PM GMT

(Originally on substack. If you share this, as I hope you will, consider using the original link. This version is an iteration on my previous one, gives more background, and references CEV. Thanks for the feedback!)

 

Right now, we have Moore’s law. Every couple of years, computers get twice better. It’s an empirical observation that’s held for many decades and across many technology generations. Whenever it runs into some physical limit, a newer paradigm replaces the old. Superconductors, multiple layers and molecular electronics may sweep aside the upcoming hurdles.

 

Thanks for reading Oleg’s Substack! Subscribe for free to receive new posts and support my work.

 

But imagine if people who are doing R&D themselves started working twice faster every couple of years, as if time moved slower for them? We’d get doubly exponential improvement.

 

And this is what we’ll see once AI is doing the R&D instead of people.

Infinitely powerful technology may give us limitless possibilities: revolutionizing medicine, increasing longevity, and bringing universal prosperity. But if it goes wrong, it can be infinitely bad.

Another reason recursion in AI scares people is wireheading. When a rat is allowed to push a button connected to the “reward” signal in its own brain, it will choose to do so, for days, ignoring all its other needs:

(image source: Wikipedia)

AI that is trained by human teachers, giving it rewards will eventually wirehead, as it becomes smarter and more powerful, and its influence over its master increases. It will, in effect, develop the ability to push its own “reward” button. Thus, its behavior will become misaligned with whatever its developers intended.

Yoshua Bengio, one of the godfathers of deep learning, warned of superintelligent AI possibly coming soon. He wrote this on his blog recently:

A popular comment on the MachineLearning sub-reddit replied:

And it is in this spirit that I’m proposing the following baseline solution. I hope you’ll share it, so that more people can discuss it and improve it, if needed.

My solution tries to address two problems: the wireheading problem above, and the problem that much text on the Internet is wrong, and therefore language models trained on it cannot be trusted.

First, to avoid the wireheading possibilities, rewards and punishments should not be used at all. All alignment should come from the model’s prompt. For example, this one:

The following is a conversation between a superhuman AI and (… lots of other factual background). The AI’s purpose in life is to do and say what its creators, humans, in their pre-singularity state of mind, having considered things carefully, would have wanted: …

Technically, the conversation happens as this text is continued. When a human says something, this is added to the prompt. And when the AI says something, it’s predicting the next word. (Some special symbols could be used to denote whose turn it is to speak)

Prompt engineering per se is not new, of course. I’m just proposing a specific prompt here. The wording is somewhat convoluted. But I think it’s necessary. As AIs get smarter, their comprehension of text will also improve though.

There is a related concept of Coherent Extrapolated Volition (CEV). But according to Eliezer Yudkowsky, it suffers from being unimplementable and possibly divergent:

My proposal is just a prompt with no explicit optimization during inference.

Next is the problem of fictional, fabricated and plain wrong text on the Internet. AI that’s trained on it directly will always be untrustworthy.

Here’s what I think we could do. Internet text is vast – on the order of a trillion words. But we could label some of it as “true” and “false”. The rest will be “unknown”.

During text generation, we’ll clamp these labels and thereby ask the model to only generate “true” words. As AIs get smarter, their ability to correlate “true”, “false” and “unknown” labels to the text will improve also.

I wanted to give something concrete and implementable for the AI doomers and anti-doomers alike to analyze and dissect.


Oleg Trott, PhD is a co-winner in the biggest ML competition ever, and the creator of the most cited molecular docking program. See olegtrott.com for details.

 

Thanks for reading Oleg’s Substack! Subscribe for free to receive new posts and support my work.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人工智能 研发风险 自我奖励 信息真实性 CEV
相关文章