少点错误 02月13日
Not all capabilities will be created equal: focus on strategically superhuman agents
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了在何种情况下人类会因代理性AI系统而面临生存风险。提出应关注‘战略性超人AI代理’,并指出相关能力及应对策略,还讨论了一些常见观点。

🎯应关注'战略性超人AI代理',其在现实世界战略行动中优于人类

🧠现实世界战略能力包括准确建模预测、社交技能、规划与资源获取

🛡️避免人类面临风险,AI代理需有符合人类利益的目标或受限能力

💡此观点比其他里程碑更接近问题核心,虽仍有模糊性但在改进

Published on February 13, 2025 1:24 AM GMT

When, exactly, should we consider humanity to have properly "lost the game", with respect to agentic AI systems?

The most common AI milestone concepts seem to be "artificial general intelligence", followed closely by "superintelligence". Sometimes people talk about "transformative AI", "high-level machine intelligence", or "full automation of the labor force." None of these are well-suited for pointing specifically at the capabilities that would spell a "point of no return" for humanity. In fact, they're all designed to be agnostic to exactly which capabilities will matter.

When working to predict and mitigate existential risks from AI agents, we should try to be as clear as possible about which capabilities we're concerned about. As a result, I think we should focus on "strategically superhuman AI agents": AI agents that are better than the best groups of humans at real-world strategic action.

Skill at real-world strategic action is context-dependent, and isn't a single capability any more than "intelligence" is a single capability: It refers to any of a broad space of situated skills. Among humans, these skills tend to be those possessed by world-class CEOs, military officers, and statesmen.

In the current strategic environment, real-world strategic capacity typically encompasses at least:

    Accurately modeling and predicting the world in some broad domain, but especially modeling and predicting individual humans and groups of humans.Social skills, including persuasion, manipulation, delegation, and coalition building.Robust planning and resource acquisition on the scale of years, and the ability to adjust plans fluidly as situations change.

I claim that we will face existential risks from AI no sooner than the development of strategically human-level artificial agents, and that those risks are likely to follow soon after.

If we are going to build these agents without "losing the game", either (a) they must have goals that are compatible with human interests, or (b) we must (increasingly accurately) model and enforce limitations on their capabilities. If there's a day when an AI agent is created without either of these conditions, that's the day I'd consider humanity to have lost. We might not be immediately wiped out by a nanobot swarm, but from that time forward humans will be more like pawns than players, and when our replacement actuators have been built, we'll likely be left without the resources we need to survive.

Low-effort FAQ

What's the point here? Does anything interesting follow from this?

Here are some things that I think are interesting:

    We don't actually need to build AGI proper, for most definitions of AGI, to manifest existential risks. It doesn't matter if your AI system is subhuman at physically solving Rubik's Cubes - it can pay or persuade human Rubik's Cube solvers to solve any Rubik's Cubes that it needs to be solved.Capabilities and controls are relevant to existential risks from agentic AI insofar as they provide or limit situated strategic power. Control schemes will require correctly identifying and limiting all sets of capabilities that would be sufficient for "escape".I think this kind of capability could arise in many settings, since I think these are very broadly economically valuable capabilities for an agent to have. But I'm especially afraid of efforts to build CEO-bots, general-bots, and president-bots, since I think these are where this kind of capability is most obviously necessary in a way that rivals the most competitive real-world strategic capacities of humans.

Isn't this just as vague as other milestones?

Yes; I'm interested in trying to make it crisper. I do think it gets closer to the heart of the problem than "AGI" or "superintelligence", and that seems like an important step.

Won't this happen as soon as we get [AGI, recursive self-improvement, ...]?

Maybe, depending on details that aren’t obvious to me.

Sure, a system that’s better-than-the-best-human in all domains is by definition better-than-the-best-human in real-world strategy. But I don’t think people have a consistent definition of AGI, and a system that’s better-than-the-best-human in all domains will also have a bunch of irrelevant capabilities, that might actually be harder for AI systems to achieve than strategic capabilities.

At least in principle, you could have recursive self-improvement that wasn’t able to, or wasn’t aiming to, achieve superhuman strategic capabilities. E.g. an extremely fast AI R&D iteration loop would have to do almost all of its learning about humans “off-policy” (i.e., without getting to interact with real-time humans during training), and (while I don’t think this is plausible) it seems possible that you can’t reach superhuman strategic ability this way within realistic resource constraints.

Are you just trying to say "powerful AI"? That's too obvious to even mention.

I disagree, in that it does not seem like people are in fact orienting to this type of threshold, which seems like it is in fact far more important than the thresholds that they are orienting to.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI系统 人类生存风险 战略性代理 能力与控制
相关文章