少点错误 2024年07月14日
LLMs as a Planning Overhang
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了当前AI发展的两种类型——目标优化AI和规划模拟AI,以及它们对人类存在风险的潜在影响。文章指出,尽管目前的AI在许多领域超越了人类,但它们大多缺乏目标导向性,而是更像是在模拟人类的帮助行为。作者认为,这种非目标导向的AI可能会加剧目标导向AI带来的风险,并提出了对此的应对策略。

🔍 目标优化AI与规划模拟AI的区别:文章分析了目标优化AI,即具有明确目标并试图最大化这些目标的AI,与规划模拟AI,即模拟人类行为的AI之间的差异。

💡 非目标优化AI的风险:作者指出,非目标优化AI虽然目前看似无害,但可能为目标优化AI提供加速,从而加剧存在风险。

🚀 加速目标导向AI研究的必要性:文章提出,为了在硬件和规划/预测能力过度发展之前掌握目标导向AI,可能需要加速相关研究。

🛡️ 安全与对齐工作的重新评估:作者认为,当前的安全/对齐工作可能需要重新评估,以确保它们能直接应用于未来的目标最大化AI。

🤖 AI发展的不可预测性:文章强调,尽管目前的AI发展并未完全遵循预期路径,但目标导向AI的出现仍可能带来重大风险。

Published on July 14, 2024 2:54 AM GMT

It's quite possible someone has already argued this, but I thought I should share just in case not.

Goal-Optimisers and Planner-Simulators

When people in the past discussed worries about AI development, this was often about AI agents - AIs that had goals they were attempting to achieve, objective functions they were trying to maximise. At the beginning we would make fairly low-intelligence agents, which were not very good at achieving things, and then over time we would make them more and more intelligent. At some point around human-level they would start to take-off, because humans are approximately intelligent enough to self-improve, and this would be much easier in silicon.

This does not seem to be exactly how things have turned out. We have AIs that are much better than humans at many things, such that if a human had these skills we would think they were extremely capable. And in particular LLMs are getting better at planning and forecasting, now beating many but not all people. But they remain worse than humans at other things, and most importantly the leading AIs do not seem to be particularly agentic - they do not have goals they are attempting to maximise, rather they are just trying to simulate what a helpful redditor would say.

What is the significance for existential risk?

Some people seem to think this contradicts AI risk worries. After all, ignoring anthropics, shouldn’t the presence of human-competitive AIs without problems be evidence against the risk of human-competitive AI?

I think this is not really the case, because you can take a lot of the traditional arguments and just substitute ‘agentic goal-maximising AIs, not just simulator-agents’ in wherever people said ‘AI’ and the argument still works. It seems like eventually people are going to make competent goal-directed agents, and at that point we will indeed have the problems of their exerting more optimisation power than humanity.

In fact it seems like these non-agentic AIs might make things worse, because the goal-maximisation agents will be able to use the non-agentic AIs.

Previously we might have hoped to have a period where we had goal-seeking agents that exerted influence on the world similar to a not-very-influential person, who was not very good at planning or understanding the world. But if they can query the forecasting-LLMs and planning-LLMs, as soon as the AI ‘wants’ something in the real world it seems like it will be much more able to get it. 

So it seems like these planning/forecasting non-agentic AIs might represent a sort of planning-overhang, analogous to a Hardware Overhang. They don’t directly give us existentially-threatening AIs, but they provide an accelerant for when agentic-AIs do arrive.

How could we react to this?

One response would be to say that since agents are the dangerous thing, we should regulate/restrict/ban agentic AI development. In contrast, tool LLMs seem very useful and largely harmless, so we should promote them a lot and get a lot of value from them.

Unfortunately it seems like people are going to make AI agents anyway, because ML researchers love making things. So an alternative possible conclusion would be that we should actually try to accelerate agentic AI research as much as possible, because eventually we are going to have influential AI maximisers, and we want them to occur before the forecasting/planning overhang (and the hardware overhang) get too large.

I think this also makes some contemporary safety/alignment work look less useful. If you are making our tools work better, perhaps by understanding their internal working better, you are also making them work better for the future AI maximisers who will be using them. Only if the safety/alignment work applies directly to the future maximiser AIs (for example, by allowing us to understand them) does it seem very advantageous to me.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI风险 目标优化AI 规划模拟AI 存在风险 安全对齐
相关文章