原创 Ace人生 2025-03-09 21:20 浙江
AI Agent是如何工作的?从OpenManus的代码剖析。
题记
本周Manus这款AI Agent因其强大的功能和灵活的使用方式引起了广泛关注。然而,Manus需要邀请码才能使用,这让许多开发者和研究者望而却步。好消息是,开源社区迅速响应,推出了OpenManus项目,它不需要任何邀请码就能实现类似Manus的功能。
OpenManus是由MetaGPT团队的几位成员在短短3小时内构建的开源项目。通过阅读其代码,我们可以深入了解AI Agent的框架设计和实现细节,这对于想要构建自己的AI Agent的开发者来说是一个绝佳的学习资源。
AI Agent框架:以OpenManus为例
1. 整体架构
OpenManus采用了模块化的架构设计,包含多个核心组件:
OpenManus
├── Agent (代理层)
│ ├── BaseAgent (基础抽象类)
│ ├── ReActAgent (思考-行动模式)
│ ├── ToolCallAgent (工具调用能力)
│ ├── PlanningAgent (规划能力)
│ ├── SWEAgent (软件工程能力)
│ └── Manus (通用代理)
├── LLM (语言模型层)
├── Memory (记忆层)
├── Tool (工具层)
│ ├── BaseTool (工具基类)
│ ├── PlanningTool (规划工具)
│ ├── PythonExecute (Python执行)
│ ├── GoogleSearch (搜索工具)
│ ├── BrowserUseTool (浏览器工具)
│ └── ... (其他工具)
├── Flow (工作流层)
│ ├── BaseFlow (基础流程)
│ └── PlanningFlow (规划流程)
└── Prompt (提示词层)
这种模块化设计使得代码复用性高、扩展性强,并且职责分离清晰。
2. LLM组件
LLM(大型语言模型)是Agent的大脑,负责理解用户输入、生成响应和决策。OpenManus通过 LLM 类封装了与语言模型的交互:
class LLM:
_instances: Dict[str, "LLM"] = {} # 单例模式实现
def __init__(
self, config_name: str = "default", llm_config: Optional[LLMSettings] = None
):
if not hasattr(self, "client"): # 只初始化一次
llm_config = llm_config or config.llm
llm_config = llm_config.get(config_name, llm_config["default"])
self.model = llm_config.model
self.max_tokens = llm_config.max_tokens
self.temperature = llm_config.temperature
self.client = AsyncOpenAI(
api_key=llm_config.api_key, base_url=llm_config.base_url
)
LLM类提供了两个核心方法:
ask : 发送普通对话请求
ask_tool : 发送带工具调用的请求
async def ask_tool(
self,
messages: List[Union[dict, Message]],
system_msgs: Optional[List[Union[dict, Message]]] = None,
timeout: int = 60,
tools: Optional[List[dict]] = None,
tool_choice: Literal["none", "auto", "required"] = "auto",
temperature: Optional[float] = None,
**kwargs,
):
# 格式化消息
if system_msgs:
system_msgs = self.format_messages(system_msgs)
messages = system_msgs + self.format_messages(messages)
else:
messages = self.format_messages(messages)
# 发送请求
response = await self.client.chat.completions.create(
model=self.model,
messages=messages,
temperature=temperature or self.temperature,
max_tokens=self.max_tokens,
tools=tools,
tool_choice=tool_choice,
timeout=timeout,
**kwargs,
)
3. Memory组件
Memory组件负责存储和管理Agent的对话历史,使Agent能够保持上下文连贯性:
class Memory(BaseModel):
"""Stores and manages agent's conversation history."""
messages: List[Message] = Field(default_factory=list)
def add_message(self, message: Union[Message, dict]) -> None:
"""Add a message to memory."""
if isinstance(message, dict):
message = Message(**message)
self.messages.append(message)
def get_messages(self) -> List[Message]:
"""Get all messages in memory."""
return self.messages
Memory组件与Agent紧密集成,通过BaseAgent的update_memory方法添加新消息:
def update_memory(
self,
role: Literal["user", "system", "assistant", "tool"],
content: str,
**kwargs,
) -> None:
"""Add a message to the agent's memory."""
message_map = {
"user": Message.user_message,
"system": Message.system_message,
"assistant": Message.assistant_message,
"tool": lambda content, **kw: Message.tool_message(content, **kw),
}
if role notin message_map:
raise ValueError(f"Unsupported message role: {role}")
msg_factory = message_map[role]
msg = msg_factory(content, **kwargs) if role == "tool"else msg_factory(content)
self.memory.add_message(msg)
4. Tools组件
Tools是Agent与外部世界交互的接口。OpenManus实现了一个灵活的工具系统,以 BaseTool 为基础:
class BaseTool(ABC, BaseModel):
name: str
description: str
parameters: Optional[dict] = None
asyncdef __call__(self, **kwargs) -> Any:
"""Execute the tool with given parameters."""
returnawait self.execute(**kwargs)
@abstractmethod
asyncdef execute(self, **kwargs) -> Any:
"""Execute the tool with given parameters."""
def to_param(self) -> Dict:
"""Convert tool to function call format."""
return {
"type": "function",
"function": {
"name": self.name,
"description": self.description,
"parameters": self.parameters,
},
}
工具执行结果通过 ToolResult 类表示:
class ToolResult(BaseModel):
"""Represents the result of a tool execution."""
output: Any = Field(default=None)
error: Optional[str] = Field(default=None)
system: Optional[str] = Field(default=None)
OpenManus提供了多种内置工具,如 PlanningTool :
class PlanningTool(BaseTool):
"""
A planning tool that allows the agent to create and manage plans for solving complex tasks.
The tool provides functionality for creating plans, updating plan steps, and tracking progress.
"""
name: str = "planning"
description: str = _PLANNING_TOOL_DESCRIPTION
parameters: dict = {
"type": "object",
"properties": {
"command": {
"description": "The command to execute. Available commands: create, update, list, get, set_active, mark_step, delete.",
"enum": [
"create",
"update",
"list",
"get",
"set_active",
"mark_step",
"delete",
],
"type": "string",
},
# 其他参数...
},
"required": ["command"],
}
5. Planning组件
Planning组件是OpenManus的核心功能之一,它使Agent能够创建和管理计划,将复杂任务分解为可管理的步骤。Planning组件包括两个主要部分:
PlanningTool :提供计划创建、更新和跟踪的功能
PlanningAgent :使用PlanningTool进行任务规划和执行
class PlanningAgent(ToolCallAgent):
"""
An agent that creates and manages plans to solve tasks.
This agent uses a planning tool to create and manage structured plans,
and tracks progress through individual steps until task completion.
"""
name: str = "planning"
description: str = "An agent that creates and manages plans to solve tasks"
system_prompt: str = PLANNING_SYSTEM_PROMPT
next_step_prompt: str = NEXT_STEP_PROMPT
available_tools: ToolCollection = Field(
default_factory=lambda: ToolCollection(PlanningTool(), Terminate())
)
# 步骤执行跟踪器
step_execution_tracker: Dict[str, Dict] = Field(default_factory=dict)
current_step_index: Optional[int] = None
PlanningAgent的核心方法包括:
async def think(self) -> bool:
"""Decide the next action based on plan status."""
prompt = (
f"CURRENT PLAN STATUS:\n{await self.get_plan()}\n\n{self.next_step_prompt}"
if self.active_plan_id
else self.next_step_prompt
)
self.messages.append(Message.user_message(prompt))
# 获取当前步骤索引
self.current_step_index = await self._get_current_step_index()
result = await super().think()
# 关联工具调用与当前步骤
if result and self.tool_calls:
# ...关联逻辑...
return result
6. Flow组件
Flow组件用于管理多个Agent的协作,实现更复杂的任务处理流程:
class BaseFlow(BaseModel, ABC):
"""Base class for execution flows supporting multiple agents"""
agents: Dict[str, BaseAgent]
tools: Optional[List] = None
primary_agent_key: Optional[str] = None
@property
def primary_agent(self) -> Optional[BaseAgent]:
"""Get the primary agent for the flow"""
return self.agents.get(self.primary_agent_key)
@abstractmethod
asyncdef execute(self, input_text: str) -> str:
"""Execute the flow with given input"""
PlanningFlow是一个具体的Flow实现,用于规划和执行任务:
class PlanningFlow(BaseFlow):
"""A flow that manages planning and execution of tasks using agents."""
llm: LLM = Field(default_factory=lambda: LLM())
planning_tool: PlanningTool = Field(default_factory=PlanningTool)
executor_keys: List[str] = Field(default_factory=list)
active_plan_id: str = Field(default_factory=lambda: f"plan_{int(time.time())}")
current_step_index: Optional[int] = None
asyncdef execute(self, input_text: str) -> str:
"""Execute the planning flow with agents."""
try:
# 创建初始计划
if input_text:
await self._create_initial_plan(input_text)
# 执行计划步骤
whileawait self._has_next_step():
# 获取当前步骤
step_info = await self._get_current_step()
# 选择合适的执行者
executor = self.get_executor(step_info.get("type"))
# 执行步骤
result = await self._execute_step(executor, step_info)
# 更新步骤状态
await self._update_step_status(step_info["index"], "completed")
# 完成计划
returnawait self._finalize_plan()
except Exception as e:
# 处理异常
returnf"Error executing flow: {str(e)}"
OpenManus的实现:Agent关键代码
OpenManus的Agent采用了层次化的架构设计,从基础代理到专业代理逐层构建。这种设计使得代码复用性高、扩展性强,并且职责分离清晰。
BaseAgent (抽象基类)
└── ReActAgent (思考-行动模式)
└── ToolCallAgent (工具调用能力)
├── PlanningAgent (规划能力)
├── SWEAgent (软件工程能力)
└── Manus (通用代理)
1. BaseAgent:基础抽象类
BaseAgent 是整个框架的基础,它定义了代理的核心属性和方法:
class BaseAgent(BaseModel, ABC):
"""Abstract base class for managing agent state and execution."""
# 核心属性
name: str = Field(..., description="Unique name of the agent")
description: Optional[str] = Field(None, description="Optional agent description")
# 提示词
system_prompt: Optional[str] = Field(None, description="System-level instruction prompt")
next_step_prompt: Optional[str] = Field(None, description="Prompt for determining next action")
# 依赖组件
llm: LLM = Field(default_factory=LLM, description="Language model instance")
memory: Memory = Field(default_factory=Memory, description="Agent's memory store")
state: AgentState = Field(default=AgentState.IDLE, description="Current agent state")
# 执行控制
max_steps: int = Field(default=10, description="Maximum steps before termination")
current_step: int = Field(default=0, description="Current step in execution")
2. ReActAgent:思考-行动模式
ReActAgent 实现了"思考-行动"模式,将代理的执行分为两个阶段:
class ReActAgent(BaseAgent, ABC):
@abstractmethod
asyncdef think(self) -> bool:
"""Process current state and decide next action"""
@abstractmethod
asyncdef act(self) -> str:
"""Execute decided actions"""
asyncdef step(self) -> str:
"""Execute a single step: think and act."""
should_act = await self.think()
ifnot should_act:
return"Thinking complete - no action needed"
returnawait self.act()
3. ToolCallAgent:工具调用能力
ToolCallAgent 为代理添加了使用工具的能力:
class ToolCallAgent(ReActAgent):
"""Base agent class for handling tool/function calls with enhanced abstraction"""
available_tools: ToolCollection = ToolCollection(
CreateChatCompletion(), Terminate()
)
tool_choices: Literal["none", "auto", "required"] = "auto"
asyncdef think(self) -> bool:
# 获取LLM响应和工具选择
response = await self.llm.ask_tool(
messages=self.messages,
system_msgs=[Message.system_message(self.system_prompt)]
if self.system_prompt
elseNone,
tools=self.available_tools.to_params(),
tool_choice=self.tool_choices,
)
self.tool_calls = response.tool_calls
# 处理响应和工具调用
# ...
asyncdef act(self) -> str:
# 执行工具调用
results = []
for command in self.tool_calls:
result = await self.execute_tool(command)
# 添加工具响应到内存
# ...
results.append(result)
return"\n\n".join(results)
4. PlanningAgent:规划能力
PlanningAgent 实现了任务规划和执行跟踪:
class PlanningAgent(ToolCallAgent):
"""
An agent that creates and manages plans to solve tasks.
This agent uses a planning tool to create and manage structured plans,
and tracks progress through individual steps until task completion.
"""
# 步骤执行跟踪器
step_execution_tracker: Dict[str, Dict] = Field(default_factory=dict)
current_step_index: Optional[int] = None
asyncdef think(self) -> bool:
"""Decide the next action based on plan status."""
prompt = (
f"CURRENT PLAN STATUS:\n{await self.get_plan()}\n\n{self.next_step_prompt}"
if self.active_plan_id
else self.next_step_prompt
)
self.messages.append(Message.user_message(prompt))
# 获取当前步骤索引
self.current_step_index = await self._get_current_step_index()
result = await super().think()
# 关联工具调用与当前步骤
if result and self.tool_calls:
# ...关联逻辑...
return result
5. Manus:通用代理
Manus 是OpenManus的核心代理,它集成了多种工具和能力:
class Manus(ToolCallAgent):
"""
A versatile general-purpose agent that uses planning to solve various tasks.
This agent extends PlanningAgent with a comprehensive set of tools and capabilities,
including Python execution, web browsing, file operations, and information retrieval
to handle a wide range of user requests.
"""
name: str = "manus"
description: str = "A versatile general-purpose agent"
system_prompt: str = SYSTEM_PROMPT
next_step_prompt: str = NEXT_STEP_PROMPT
available_tools: ToolCollection = Field(
default_factory=lambda: ToolCollection(
PythonExecute(), GoogleSearch(), BrowserUseTool(), FileSaver(), Terminate()
)
)
Prompt在构建Agent系统中的重要作用
Prompt在构建Agent系统中扮演着至关重要的角色,它不仅定义了Agent的行为模式,还指导了Agent的决策过程和工具使用方式。
1. 系统Prompt:定义Agent的角色和能力
系统Prompt为Agent提供了基本的角色定义和行为指南:
SYSTEM_PROMPT = "You are OpenManus, an all-capable AI assistant, aimed at solving any task presented by the user. You have various tools at your disposal that you can call upon to efficiently complete complex requests. Whether it's programming, information retrieval, file processing, or web browsing, you can handle it all."
这个Prompt告诉Agent它是谁、能做什么,以及应该如何行动。通过这种方式,我们可以塑造Agent的"人格"和专业领域。
2. 规划Prompt:指导Agent进行任务分解和规划
规划Prompt指导Agent如何分解复杂任务并创建执行计划:
PLANNING_SYSTEM_PROMPT = """
You are an expert Planning Agent tasked with solving complex problems by creating and managing structured plans.
Your job is:
1. Analyze requests to understand the task scope
2. Create clear, actionable plans with the `planning` tool
3. Execute steps using available tools as needed
4. Track progress and adapt plans dynamically
5. Use `finish` to conclude when the task is complete
Available tools will vary by task but may include:
- `planning`: Create, update, and track plans (commands: create, update, mark_step, etc.)
- `finish`: End the task when complete
Break tasks into logical, sequential steps. Think about dependencies and verification methods.
"""
这个Prompt不仅告诉Agent它的角色是一个规划专家,还详细说明了它应该如何使用规划工具来分解任务、创建计划并跟踪进度。
3. 工具使用Prompt:指导Agent如何使用工具
工具使用Prompt指导Agent如何选择和使用合适的工具:
NEXT_STEP_PROMPT = """You can interact with the computer using PythonExecute, save important content and information files through FileSaver, open browsers with BrowserUseTool, and retrieve information using GoogleSearch.
PythonExecute: Execute Python code to interact with the computer system, data processing, automation tasks, etc.
FileSaver: Save files locally, such as txt, py, html, etc.
BrowserUseTool: Open, browse, and use web browsers.If you open a local HTML file, you must provide the absolute path to the file.
GoogleSearch: Perform web information retrieval
Based on user needs, proactively select the most appropriate tool or combination of tools. For complex tasks, you can break down the problem and use different tools step by step to solve it. After using each tool, clearly explain the execution results and suggest the next steps.
"""
这个Prompt详细介绍了每个工具的功能和使用场景,帮助Agent在面对不同任务时选择最合适的工具。
4. 动态Prompt生成
在OpenManus中,Prompt不仅是静态的,还可以动态生成和参数化。例如,在PlanningAgent中,系统会动态将当前计划状态注入到Prompt中:
async def think(self) -> bool:
"""Decide the next action based on plan status."""
prompt = (
f"CURRENT PLAN STATUS:\n{await self.get_plan()}\n\n{self.next_step_prompt}"
if self.active_plan_id
else self.next_step_prompt
)
self.messages.append(Message.user_message(prompt))
这种动态Prompt使Agent能够根据当前状态做出更合适的决策。
小结
通过对OpenManus代码的分析,我们可以看到一个完整的AI Agent框架应该包含以下几个关键组件:
Agent:从基础代理到专业代理的层次结构,实现不同级别的能力
BaseAgent:提供基础的状态管理和执行循环
ReActAgent:实现思考-行动模式
ToolCallAgent:添加工具调用能力
专业代理:如PlanningAgent、SWEAgent和Manus
LLM:封装与大型语言模型的交互,提供对话和工具调用能力
支持普通对话和工具调用
实现重试机制和错误处理
支持流式响应
Memory:管理对话历史和上下文
存储和检索消息
维护对话上下文
Tool:提供与外部世界交互的接口
基础工具抽象
多种专业工具实现
工具结果处理
Planning:实现任务规划和执行跟踪
计划创建和管理
步骤状态跟踪
动态调整计划
Flow:管理多个Agent的协作
任务分配
结果整合
流程控制
Prompt:指导Agent的行为和决策
系统Prompt定义角色
专业Prompt指导决策
动态Prompt生成
OpenManus的设计思路清晰,代码结构合理,是学习AI Agent实现的优秀范例。它的模块化设计使得开发者可以轻松扩展和定制自己的代理。
对于想要深入了解AI Agent或构建自己的Agent系统的开发者来说,OpenManus提供了一个很好的起点。通过学习其架构和实现,我们可以更好地理解AI Agent的工作原理和设计思路。