少点错误 03月03日
Will LLM agents become the first takeover-capable AGIs?
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了人们对以LLM为基础的代理(LMAs)作为通向具有接管能力的AGI途径的看法。提到人们可能不关注LMAs的两种原因,还提及了一些相关的论点和问题,并对在对齐社区中的观点分布表示好奇。

人们对LMAs的两种看法:认为对齐LLMs可涵盖对齐LMAs;认为LMAs不太可能是通向AGI的首要途径。

提到了一些相关论点和问题,如LLMs是否产生新见解等。

作者认为语言训练很重要,人类在某些方面相对较弱,代理架构和训练可能改善相关技能。

Published on March 2, 2025 5:15 PM GMT

One of my takeaways from EA Global this year was that most alignment people aren't explicitly focused on LLM-based agents (LMAs)[1] as a route to takeover-capable AGI. I want to better understand this position, since I estimate this path to AGI as likely enough (maybe around 60%) to be worth specific focus and concern.

Two reasons people might not care about aligning LMAs in particular:

    Thinking this route to AGI is quite possible but that aligning LLMs mostly covers aligning LLM agentsThinking LLM-based agents are unlikely to be the first takeover-capable AGI

I'm aware of arguments/questions like Have LLMs Generated Novel Insights?, LLM Generality is a Timeline Crux, and LLMs' weakness on what Steve Byrnes calls discernment: the ability to tell their better ideas/outputs from their worse ones.[2] I'm curious if these or other ideas play a major role in your thinking.

I'm even more curious about the distribution of opinions around type 1 (aligning LLMs covers aligning LMAs) and 2 (LMAs are not a likely route to AGI) in the alignment community. [3]

Edit: Based on the comments, I think perhaps this question is too broadly stated. The better question is "what sort of LMAs do you expect to reach takeover-capable AGI?" 
 

  1. ^

    For these purposes I want to consider language model agents (LMAs) broadly. I mean any sort of system that uses models that are substantially trained on human language, similar to current GPTs trained primarily to predict human language use.

    Agents based on language models could be systems with a lot or a little scaffolding (including but not limited to hard-coded prompts for different cognitive purposes), and other cognitive systems (including but not limited to dedicated one-shot memory systems and executive function/planning or metacognition systems). This is a large category of models, but they have important similarities for alignment purposes: LLMs generate their "thoughts", while other systems direct and modify those "thoughts", to both organizing and chaotic effect.

    This of course includes multimodal foundation models that include natural language training as a major component; most current things we call LLMs are technically foundation models. I think language training is the most important bit. I suspect that language training is remarkably effective because human language is a high-effort distillation of the world's semantics; but that is another story. 

  2. ^

    I think that humans are also relatively weak at generating novel insights, generalizing, and discernment using our System 1 processing. I think that agentic scaffolding and training is likely to improve System 2 strategies and skills similar to those humans use to scrape by in those areas. 

  3. ^

    Here is my brief abstract argument for why there are no breakthroughs needed for this route to AGI, this summarizes the plan for aligning them in short timelines; and System 2 Alignment is my latest in-depth prediction on how labs will try to align them by default, and how those methods could succeed or fail.  



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

语言模型代理 AGI 观点看法 语言训练
相关文章