Latent 03月25日 03:19
Agent Engineering
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文深入探讨了swyx在2025年AI工程师峰会上的主题演讲,并概述了会议中关于Agent工程的演讲。文章首先定义了Agent,探讨了Agent工程的六大要素:LLMs与工具、编码意图、LLM驱动的控制流程、多步规划、长期运行的记忆以及授权委托。此外,文章还分析了Agent工程领域的发展趋势,包括模型、工具的改进,以及商业模式的转变。最后,文章强调了Agent工程在AI领域的重要性,并展望了其未来的发展。

💡 **LLMs与工具:**这是Agent工程中最基础的部分,包括RAG(检索增强生成)、沙盒/画布和浏览器/CUA等工具,这些工具是Agent执行任务的基础。

🎯 **编码意图:**意图通过多模态输入输出(如语音)进入系统,并被编码成目标,通过评估在环境中进行验证。这是Agent理解和执行任务的关键。

⚙️ **LLM驱动的控制流程:**LLMs在预设工作流程和自主Agent之间发挥着关键作用,LLM决定了应用程序的控制流程,使Agent更具自主性。

🗓️ **多步规划:**先进的Agent能够执行多步骤操作,例如Devin/Manus Agents展示了可编辑的计划,这是Agent解决复杂问题的能力体现。

🧠 **长期运行的记忆:**Agent需要长期记忆来保持连贯性和自我改进。除了MemGPT/MCP等记忆机制,还提到了Voyager等结构化记忆形式,增强了Agent的持续学习能力。

🤝 **授权委托:**信任是Agent工程中容易被忽视但至关重要的因素。在企业环境中,Agent需要被信任才能代表他人行事,确保Agent的可靠性和安全性。

This post contains elaborations on swyx’s 2025 AI Engineer Summit keynote, which also serves as a cohesive overview of a selection of Agents talks from the conference which link-clickers can preview. You can find the original video and slides here.

If you enjoyed our Claude Plays Pokemon Lightning pod, we are doubling down with a Claude Plays Pokemon hackathon with David from Anthropic! Sign up here.


When we first asked ourselves what we’d do differently from Summit 2023 and WF 2024, the answer was a clearer focus1 on practical2 examples and techniques. After some debate, we finally decided to take “agent engineering” head on.

First thing in discussing agent engineerings, we have to achieve the simple task of defining agents.

slide updated with the IMPACT backronym

Defining Agents: A Linguistic Approach

Simon Willison, everyone’s favorite guest on LS and 2023 and 2024 AI Engineer keynoter, loves asking people on their agent definitions. It is an open secret that nobody agrees, and therefore debates about agent problems and frameworks are near-impossible since you can set the bar as low or as high as you want. Your choice of word is also strongly determined by your POV: Intentionally or not, people always overemphasize where they start from and trivialize every perspective that doesn’t.

In fact, even within OpenAI the definitions disagree — in day 1 of the conference OpenAI released a new working definition for the Agents SDK:

An agent is an Al application consisting of

    a model equipped with

    instructions that guide its behavior,

    access to tools that extend its capabilities,

    encapsulated in a runtime with a dynamic lifecycle.

We’ll acronymize this as “TRIM”, but note what it DOESN’T say compared to OpenAI’s own Lilian Weng (now co-founder of Thinking Machines with Mira Murati) in her post:

Agent = LLM + memory + planning skills + tool use

Everyone agrees on Models and Tools, but TRIM forgets planning and memory, and Lilian takes prompts and runtime orchestration for granted.

Achieving common understanding of a word is not a technical matter; but a linguistic one. And the most robust approach is descriptive, not prescriptive. Aka, achieving a fully spanning (maybe MECE) understanding of how every serious party defines the word. Simon has collected over 250 replies — so I did the last-mile of reading through all the groupings and applying human judgment…

The Six Elements of Agent Engineering

I’ve ranked them in rough descending order of commonality/importance:

When n > 3, acronyms can be helpful mnemonics, so we have selected the first letter to form IMPACT5.

You can FEEL when an agent forgets one of these 6 things. OpenAI’s TRIM agent framework has no emphasis on memory, planning, or auth, and while these can all be categorized as existing at the tool layer, they take on special roles and meaning in agent engineering and probably should have a lot more care put into them.

Agents, Hot and Cold

We’ve tried to accurately report the general “it’s so over”/”we are so back” duality of man in the AI Eng scene over the past years.

Spring 2023. In The Anatomy of Autonomy: Why Agents are the next AI Killer App after ChatGPT we tried to explain why the excitement of ChatGPT segued immediately into AutoGPT and BabyAGI (further explored with Itamar Friedman of Codium now Qodo).

Fall 2023 - Spring 2024. Then came the nadir of sentiment in Why AI Agents Don't Work (yet) with Kanjun of Imbue, with the first OpenAI Dev Day launching custom GPTs to a flop and subsequent board crisis. The Winds of AI Winter lasted all the way til David Luan asked us why Agents had become a bad word in Silicon Valley:

Summer 2024. The rebound came as Crew AI and LlamaIndex’s Agentic RAG became the most viewed talks at World’s Fair, our podcast on Llama 3 also introduced the first discussion of Llama 4’s focus on agents, which Soumith teased in his talk.

Fall 2024. It was Strawberry season, and with OpenAI hiring the top Agents researchers and releasing 100% reliable structured output and o1 in the API, reasoning models reignited the agent discussion in a very big way….

… if you also forgot about Claude 3.5, released in June and updated in Nov, which doubled Anthropic’s market share by simply being the best coding model and the model powering many SOTA agents like Bolt, Lindy, and Windsurf (talk):

All of which led up to Winter-Spring 2025, when OpenAI shot back with its first Operator and Deep Research agents and we went All In on Agent Engineering for NYC.

In fact, you can track ChatGPT’s growth numbers closely to model releases (as I did) and it is clear that the reacceleration of ChatGPT is all due to reasoning/agent work:

https://www.threads.net/@theturingpost/post/DGYk1P7oFCj/7-agents-everywhereheres-an-interesting-chart-of-chatgpt-according-to-swyx-its-g

However, we think this chronology tracking model progress and general sentiment swings isn’t even a complete account of the agent resurgence, which is still on-trend for those paying attention to broad benchmarks.

from m-ric of smolagents (our lightning pod with him). the agent horizon varies depending on reliability cutoff, but METR says it doubles every 3-7 months

this slide in the talk

Why work on Agent Engineering Now?

This is why there’s a new resurgence in agents and the field of Agent Engineering is just now becoming the hottest thing in AI Engineering.

Full talk here

See me speed thru my slides on YouTube and leave a comment on what else you see!

1

Saying no to a lot of interesting directions in AI - focusing in on just one of the tracks we had last year but making a deep exploration of one topic rather than going wide

2

No direct vendor pitches; a draconian rule inspired by dbt’s Coalesce conference. This feels harsh because of course some of the people most qualified to talk about a problem also sell a solution for it; this meant we had to actively solicit talks outside the CFP process from people who would not normally apply to speak, like Bloomberg and LinkedIn and Jane Street, and the only way for a vendor to get on our stage is to also bring a customer to talk about their real lived experiences, like Method Financial/OpenPipe and Pfizer/Neo4j and Booking.com/Sourcegraph.

3

Rahul’s (Ramp’s) talk also frames the choice as a form of Bitter Lesson - workflows get you far in the short term, but often get steamrolled by the next order of magnitude gain in intelligence or cost/intelligence.

4

Agents that ask for confirmation before every single external action - many real agents (like Windsurf) have had to figure out clever ways of exempting actions from human approval in order for the agent to have meaningful autonomy.

5

“write agents with IMPACT!” too hokey? I like it because M, P, A, C, and T came naturally already, so the only armtwisty one was “Intent”, because I didn’t want to limit it to OpenAI TRIM’s “Instructions” alone — the combination of Instructions and Evals felt better to guide agent behavior in the same way that the generator-verifier gap works at the model level.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Agent工程 AI工程师峰会 LLMs 工具
相关文章