05_LangChain消息存储与管理

消息存储概述

在构建基于LLM的对话应用时，管理对话历史记录是至关重要的一环。LangChain提供了多种方式来存储和管理对话历史，从简单的内存存储到持久化的数据库存储。本文将介绍几种常见的消息存储方法及其使用场景。

1. 消息存储在内存

最简单的方式是将聊天历史保存在内存中。下面展示一个简单示例，通过全局Python字典实现内存中的消息存储。

from typing import Dict, List, Optionalfrom langchain_core.chat_history import BaseChatMessageHistoryfrom langchain_core.messages import BaseMessagefrom langchain_core.runnables.history import RunnableWithMessageHistoryfrom langchain_openai import ChatOpenAIfrom langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder# 全局字典存储会话历史memory_store: Dict[str, BaseChatMessageHistory] = {}# 创建一个函数，用于获取或创建会话历史def get_session_history(session_id: str) -> BaseChatMessageHistory:    from langchain_core.chat_history import ChatMessageHistory        if session_id not in memory_store:        memory_store[session_id] = ChatMessageHistory()    return memory_store[session_id]# 创建一个简单的聊天链prompt = ChatPromptTemplate.from_messages([    ("system", "你是一个友好的AI助手。"),    MessagesPlaceholder(variable_name="history"),    ("human", "{input}")])chain = prompt | ChatOpenAI(model="gpt-3.5-turbo")# 包装链以包含消息历史chain_with_history = RunnableWithMessageHistory(    chain,    get_session_history,    input_messages_key="input",    history_messages_key="history",)

使用单参数默认值：

# 调用链并指定会话IDresponse = chain_with_history.invoke(    {"input": "你好！"},    config={"configurable": {"session_id": "user_123"}})print(response)# 再次调用，历史记录会被保留response = chain_with_history.invoke(    {"input": "我的名字是小明"},    config={"configurable": {"session_id": "user_123"}})print(response)# 查看存储的消息print(memory_store["user_123"].messages)

实际项目中的完整示例

下面是一个更完整的示例，展示如何在实际项目中使用内存存储来创建一个简单的聊天机器人：

import osfrom typing import Dictfrom langchain_core.chat_history import BaseChatMessageHistory, ChatMessageHistoryfrom langchain_core.runnables.history import RunnableWithMessageHistoryfrom langchain_openai import ChatOpenAIfrom langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder# 设置OpenAI API密钥os.environ["OPENAI_API_KEY"] = "你的OpenAI API密钥"# 全局字典存储会话历史memory_store: Dict[str, BaseChatMessageHistory] = {}def get_session_history(session_id: str) -> BaseChatMessageHistory:    """获取或创建会话历史"""    if session_id not in memory_store:        memory_store[session_id] = ChatMessageHistory()    return memory_store[session_id]def create_chat_chain():    """创建聊天链"""    # 定义系统提示和聊天模板    prompt = ChatPromptTemplate.from_messages([        ("system", "你是一个友好、有帮助的AI助手。你的回答应该简洁、准确、用中文回复。"),        MessagesPlaceholder(variable_name="history"),        ("human", "{input}")    ])        # 创建LLM    model = ChatOpenAI(        model="gpt-3.5-turbo",        temperature=0.7,    )        # 创建基本链    chain = prompt | model        # 包装链以包含消息历史    return RunnableWithMessageHistory(        chain,        get_session_history,        input_messages_key="input",        history_messages_key="history",    )def chat_with_bot(chain, user_input: str, session_id: str = "default"):    """与聊天机器人交互"""    response = chain.invoke(        {"input": user_input},        config={"configurable": {"session_id": session_id}}    )    return response.contentdef main():    """主函数"""    # 创建聊天链    chat_chain = create_chat_chain()        # 用户ID    user_id = "user_123"        print("欢迎使用聊天机器人！输入'退出'结束对话。")        while True:        # 获取用户输入        user_input = input("你: ")                # 检查是否退出        if user_input.lower() in ["退出", "quit", "exit"]:            break                # 获取机器人响应        bot_response = chat_with_bot(chat_chain, user_input, user_id)                # 打印机器人响应        print(f"机器人: {bot_response}")                # 打印当前历史消息数量        history = get_session_history(user_id)        print(f"[调试信息] 当前历史消息数: {len(history.messages)}")if __name__ == "__main__":    main()

运行这个脚本将启动一个交互式聊天机器人，它会记住对话历史并在回复中考虑之前的交流。

配置会话唯一键

我们可以通过向history_factory_config参数传递ConfigurableFieldSpec对象列表来自定义跟踪消息历史的配置参数：

from langchain_core.runnables import ConfigurableFieldSpec# 使用两个参数：user_id和conversation_idchain_with_history = RunnableWithMessageHistory(    chain,    get_session_history,    input_messages_key="input",    history_messages_key="history",    history_factory_config=[        ConfigurableFieldSpec(            id="user_id",            annotation=str,            name="User ID",            description="唯一用户标识符",            default="",            is_shared=True,        ),        ConfigurableFieldSpec(            id="conversation_id",            annotation=str,            name="Conversation ID",            description="特定对话的标识符",            default="",            is_shared=True,        ),    ],)# 定义获取会话历史的函数def get_session_history(user_id: str, conversation_id: str) -> BaseChatMessageHistory:    from langchain_core.chat_history import ChatMessageHistory        session_id = f"{user_id}_{conversation_id}"    if session_id not in memory_store:        memory_store[session_id] = ChatMessageHistory()    return memory_store[session_id]# 调用链条response = chain_with_history.invoke(    {"input": "你好！"},    config={        "configurable": {            "user_id": "user_123",            "conversation_id": "conv_456"        }    })print(response)

2. 消息持久化到Redis

在许多情况下，持久化对话历史是必要的。RunnableWithMessageHistory对于get_session_history可调用如何检索其聊天消息历史是中立的。下面我们演示如何使用Redis进行消息持久化。

配置Redis环境

首先需要安装Redis相关依赖：

pip install "langchain-redis"

如果没有现有的Redis部署，可以启动本地Redis Stack服务器：

docker run -d -p 6379:6379 -p 8001:8001 redis/redis-stack:latest

使用Redis存储消息历史

from langchain_redis import RedisChatMessageHistorydef get_redis_history(session_id: str) -> BaseChatMessageHistory:    # 使用Redis存储聊天历史    return RedisChatMessageHistory(        session_id=session_id,        url="redis://localhost:6379"    )# 包装链以包含Redis消息历史redis_chain_with_history = RunnableWithMessageHistory(    chain,    get_redis_history,    input_messages_key="input",    history_messages_key="history",)# 调用聊天接口，看Redis是否存储历史记录response = redis_chain_with_history.invoke(    {"input": "你好！我是来自中国的用户"},    config={"configurable": {"session_id": "redis_user_123"}})print(response)# 再次调用response = redis_chain_with_history.invoke(    {"input": "你能记住我之前说过什么吗？"},    config={"configurable": {"session_id": "redis_user_123"}})print(response)

Redis历史记录查询

可以直接查询Redis中存储的历史记录：

# 获取Redis中存储的历史记录redis_history = get_redis_history("redis_user_123")print("Redis中存储的消息历史:")for message in redis_history.messages:    print(f"{message.type}: {message.content}")

3. 修改聊天历史

修改存储的聊天消息可以帮助聊天机器人处理各种情况，如上下文窗口限制或提供更好的对话体验。

3.1 裁剪消息

LLM和聊天模型有限的上下文窗口，即使没有直接达到限制，也可能希望限制模型处理的信息量。一种解决方案是只加载和存储最近的n条消息：

from langchain_core.chat_history import ChatMessageHistoryfrom langchain_core.messages import HumanMessage, AIMessage# 创建一个带有预加载消息的历史记录history = ChatMessageHistory()history.add_message(HumanMessage(content="我的名字是小明"))history.add_message(AIMessage(content="你好小明，很高兴认识你！"))history.add_message(HumanMessage(content="我喜欢编程"))history.add_message(AIMessage(content="那太棒了！编程是一项很有价值的技能。"))# 定义一个函数来裁剪消息def trim_messages(messages: List[BaseMessage], max_messages: int = 2) -> List[BaseMessage]:    """只保留最近的max_messages条消息"""    return messages[-max_messages:] if len(messages) > max_messages else messages# 创建一个新的链，包含裁剪功能def get_trimmed_history(session_id: str) -> BaseChatMessageHistory:    if session_id not in memory_store:        memory_store[session_id] = ChatMessageHistory()        # 裁剪历史记录，只保留最近的2条消息    current_history = memory_store[session_id]    trimmed_messages = trim_messages(current_history.messages, max_messages=2)        # 清除并重新添加裁剪后的消息    current_history.clear()    for message in trimmed_messages:        if isinstance(message, HumanMessage):            current_history.add_user_message(message.content)        elif isinstance(message, AIMessage):            current_history.add_ai_message(message.content)        return current_history# 使用裁剪功能的链trim_chain_with_history = RunnableWithMessageHistory(    chain,    get_trimmed_history,    input_messages_key="input",    history_messages_key="history",)

调用这个新链并检查消息：

# 初始化会话memory_store["trim_session"] = history# 调用链response = trim_chain_with_history.invoke(    {"input": "我最喜欢的编程语言是Python"},    config={"configurable": {"session_id": "trim_session"}})print(response)# 检查裁剪后的历史记录print("裁剪后的历史记录:")for message in memory_store["trim_session"].messages:    print(f"{message.type}: {message.content}")

可以看到历史记录已经删除了两条最旧的消息，同时在末尾添加了最近的对话。下次调用链时，trim_messages将再次被调用，只有最近的两条消息将被传递给模型。

4. 总结记忆

另一种管理长对话的方法是使用额外的LLM调用来生成对话摘要，而不是保留完整的历史记录：

from langchain_core.prompts import PromptTemplatefrom langchain_core.messages import SystemMessage# 创建一个新的聊天历史summary_history = ChatMessageHistory()summary_history.add_message(HumanMessage(content="我的名字是小红"))summary_history.add_message(AIMessage(content="你好小红，很高兴认识你！"))summary_history.add_message(HumanMessage(content="我是一名医生"))summary_history.add_message(AIMessage(content="医生是一个非常崇高的职业！您在哪个领域专长呢？"))# 创建一个修改后的提示，让LLM意识到它将收到一个摘要summary_prompt = ChatPromptTemplate.from_messages([    ("system", "你是一个友好的AI助手。下面是之前对话的摘要：{summary}"),    ("human", "{input}")])summary_chain = summary_prompt | ChatOpenAI(model="gpt-3.5-turbo")# 创建一个函数来总结对话历史def summarize_history(messages: List[BaseMessage]) -> str:    """总结对话历史"""    if not messages:        return "这是对话的开始。"        # 创建总结提示    summarize_prompt = PromptTemplate.from_template(        "以下是一段对话历史，请用一到两句话总结重要信息：\n{chat_history}"    )        # 格式化对话历史    chat_history_str = "\n".join(        [f"{m.type}: {m.content}" for m in messages]    )        # 使用LLM生成摘要    summarizer = ChatOpenAI(model="gpt-3.5-turbo")    summary = summarizer.invoke(summarize_prompt.format(chat_history=chat_history_str))    return summary.content# 定义一个函数来获取带摘要的会话历史def get_summarized_history(session_id: str) -> tuple[BaseChatMessageHistory, str]:    if session_id not in memory_store:        memory_store[session_id] = ChatMessageHistory()        return memory_store[session_id], "这是对话的开始。"        # 获取当前历史并生成摘要    current_history = memory_store[session_id]    summary = summarize_history(current_history.messages)        return current_history, summary# 创建一个包装函数，用于RunnableWithMessageHistorydef get_history_with_summary(session_id: str) -> BaseChatMessageHistory:    history, _ = get_summarized_history(session_id)    return history# 修改链以使用摘要def run_with_summary(inputs: dict, session_id: str) -> dict:    history, summary = get_summarized_history(session_id)        # 添加摘要到输入    inputs_with_summary = inputs.copy()    inputs_with_summary["summary"] = summary        # 调用链    response = summary_chain.invoke(inputs_with_summary)        # 更新历史    history.add_message(HumanMessage(content=inputs["input"]))    history.add_message(AIMessage(content=response.content))        return response# 初始化会话memory_store["summary_session"] = summary_history# 使用摘要运行对话response = run_with_summary(    {"input": "你能告诉我你记得关于我的什么信息？"},    "summary_session")print(response.content)# 查看聊天历史记录print("\n聊天历史记录:")for message in memory_store["summary_session"].messages:    print(f"{message.type}: {message.content}")# 获取生成的摘要_, summary = get_summarized_history("summary_session")print("\n生成的摘要:")print(summary)

请注意，再次调用链式模型会生成一个新的摘要，该摘要包括初始摘要以及新的消息等。您还可以设计一种混合方法，其中一定数量的消息保留在聊天历史记录中，而其他消息则被摘要。

结论

LangChain提供了多种灵活的方式来管理聊天历史记录，从简单的内存存储到持久化的数据库存储，以及各种修改和优化历史记录的方法。根据应用场景的不同，可以选择最适合的方法来提高对话体验和性能。

对于简单的应用，内存存储是最直接的选择对于需要持久化的应用，可以选择Redis或其他数据库存储对于长对话，可以使用裁剪或摘要方法来管理上下文窗口

通过合理管理聊天历史记录，可以构建出更加智能、自然的对话应用。

章节总结

在本章中，我们深入探讨了LangChain中的消息存储与管理机制，包括：

内存存储

Redis持久化

消息管理技术

这些技术可以帮助开发者构建更加智能、自然的对话应用，提高用户体验。通过合理选择和组合这些方法，可以根据具体应用场景优化LLM的对话能力。

下一章：06_LangChain高级应用 - 我们将探索LangChain的更高级应用，包括代理、工具使用和复杂工作流的构建。