掘金 人工智能 2024年07月08日
langchain循序渐进之langchain 安装及使用
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章介绍了pip安装langchain及相关操作,包括构建多种检索链来帮助测试等内容

💻pip安装langchain及相关设置,如安装langchain-openai并设置秘钥,还介绍了如何使用prompt提示模板引导LLM的响应

🔍构建多种检索链,如简单的LLM链、能感知对话上下文的检索链等,并阐述了LangSmith在测试中的多种帮助方式

📄详细说明了构建检索链的过程,包括加载数据、索引数据到向量存储、创建检索链并进行调用等操作

🧐强调了在使用相关工具和功能时的注意事项,如api_key的替换、余额问题等

pip安装langchain

pip install langchain

安装langsmith(可选)

langsmith官方提示是用来观察大模型复杂调用情况,可选项。

LangSmith 点击注册然后把秘钥填进去就行,这里我略过了

export LANGCHAIN_TRACING_V2="true"  export LANGCHAIN_API_KEY="..."

体验langchain 几个过程

试用一个简单的 LLM 链

安装langchain-openai

pip install langchain-openai

设置秘钥

export OPENAI_API_KEY="..."

使用前初始化

from langchain_openai import ChatOpenAIllm = ChatOpenAI()

一旦你已经安装并初始化了所选的大型语言模型(LLM),我们就可以尝试使用它了!让我们问它"LangSmith是什么"——这是训练数据中不存在的内容,所以它可能不会有很好的回答。

llm.invoke("how can langsmith help with testing?")

我们还可以使用prompt提示模板来引导它的响应。提示模板将原始用户输入转换为更适合大型语言模型(LLM)的输入。

from langchain_core.prompts import ChatPromptTemplateprompt = ChatPromptTemplate.from_messages([    ("system", "You are a world class technical documentation writer."),    ("user", "{input}")])

我们可以直接传递文档来自己运行这个流程:

from langchain_core.documents import Documentdocument_chain.invoke({    "input": "how can langsmith help with testing?",    "context": [Document(page_content="langsmith can let you visualize test results")]})

prompt和llm一起使用

from langchain_core.output_parsers import StrOutputParser  output_parser = StrOutputParser()// prompt 是提示,llm|output_parser将大语言模型输出的结构化chatmodel 变为字符串输出chain = prompt | llm | output_parserchain.invoke({"input": "how can langsmith help with testing?"})

目前整理如下,注意api_key要替换成自己的,并且有余额才行,无余额会报错

from langchain_openai import ChatOpenAIllm = ChatOpenAI()from langchain_core.prompts import ChatPromptTemplateprompt = ChatPromptTemplate.from_messages([    ("system", "You are a world class technical documentation writer."),    ("user", "{input}")])from langchain_core.output_parsers import StrOutputParseroutput_parser = StrOutputParser()chain = prompt | llm | output_parserresult = chain.invoke({"input": "how can langsmith help with testing?"})print(result)

我这里打印的结果是这样的(貌似每次输出都不一样啊,这里仅供参考)

Langsmith can help with testing in several ways:

    Automated Testing: Langsmith can be used to generate test data for automated testing scripts. By creating realistic and diverse test data, Langsmith can help ensure comprehensive test coverage.

    Performance Testing: Langsmith can generate large volumes of data to simulate real-world usage scenarios, allowing for performance testing of systems and applications under load.

    Data Validation: Langsmith can be used to validate the accuracy and integrity of data by generating test data sets that cover various edge cases and boundary conditions.

    Regression Testing: Langsmith can help streamline the testing process by quickly generating test data for regression testing, ensuring that new code changes do not introduce unexpected bugs or issues.

Overall, Langsmith can be a valuable tool for testing teams to improve the efficiency and effectiveness of their testing processes.

问题
注意chantgpt 需要有付费账号并有余额才行,否则会报错如下

openai.RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}

构建一个检索链 Retrieval Chain

为了恰当地回答原始问题(“langsmith如何帮助测试?”),我们需要为大型语言模型(LLM)提供额外的上下文。这可以通过检索来实现。当你有太多数据无法直接传递给LLM时,检索就很有用。然后,你可以使用检索器仅获取最相关的部分并将其传递进去。
在这个过程中,我们将从检索器中查找相关文档,然后将它们传递给提示。检索器可以由任何内容支持——SQL表、互联网等——但在这个例子中,我们将填充一个向量存储并使用它作为检索器。有关向量存储的更多信息,请参阅相关文档。
首先,我们需要加载我们想要索引的数据。为此,我们将使用WebBaseLoader。这需要安装BeautifulSoup:

pip install beautifulsoup4

之后我们可以导入并使用它

from langchain_community.document_loaders import WebBaseLoaderloader = WebBaseLoader("https://docs.smith.langchain.com/user_guide")docs = loader.load()

接下来,我们需要将数据索引到向量存储中。这需要几个组件,即embedding modelvectorstore

对于嵌入模型,我们再次提供通过API访问或运行本地模型的示例。以openAI 为例

from langchain_openai import OpenAIEmbeddingsembeddings = OpenAIEmbeddings()

安装向量数据库

pip install faiss-cpu

有了向量数据库我们就可以构建索引了

from langchain_community.vectorstores import FAISSfrom langchain_text_splitters import RecursiveCharacterTextSplittertext_splitter = RecursiveCharacterTextSplitter()documents = text_splitter.split_documents(docs)vector = FAISS.from_documents(documents, embeddings)

既然我们已经将数据索引到向量存储中,接下来我们将创建一个检索链。这个链将接受一个输入问题,查找相关文档,然后将这些文档与原始问题一起传递给大型语言模型(LLM),并请求它回答原始问题。

首先,让我们设置这个链,该链将接受一个问题以及检索到的文档,并生成一个答案。

from langchain.chains.combine_documents import create_stuff_documents_chainprompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:<context>{context}</context>Question: {input}""")from langchain_openai import ChatOpenAIllm = ChatOpenAI()document_chain = create_stuff_documents_chain(llm, prompt)

然而,我们希望文档首先来自我们刚刚设置的检索器。这样,我们就可以使用检索器动态选择最相关的文档,并将这些文档传递给给定的问题。

from langchain.chains import create_retrieval_chainretriever = vector.as_retriever()retrieval_chain = create_retrieval_chain(retriever, document_chain)

我们现在可以调用这个链。这将返回一个字典——大型语言模型(LLM)的响应位于“answer”键中。

response = retrieval_chain.invoke({"input": "how can langsmith help with testing?"})print(response["answer"])# LangSmith offers several features that can help with testing:...

最终我这里打印的返回是这样的(貌似每次输出都不一样啊,这里仅供参考):

LangSmith can help with testing in several ways:

    Prototyping: LangSmith allows for quick experimentation between prompts, model types, retrieval strategies, and other parameters, enabling rapid understanding of how the model is performing and debugging where it is failing during the prototyping phase.

    Debugging: LangSmith tracing provides clear visibility and debugging information at each step of an LLM sequence, making it easier to identify and root-cause issues when things go wrong.

    Initial Test Set: Developers can create datasets and use them to run tests on their LLM applications. LangSmith also facilitates running custom evaluations to score test results.

    Comparison View: LangSmith offers a user-friendly comparison view for test runs to track and diagnose regressions in test scores across multiple revisions of an application.

    Playground: LangSmith provides a playground environment for rapid iteration and experimentation, allowing developers to quickly test out different prompts and models, and log every playground run in the system for future use.

    Beta Testing: LangSmith enables the collection of data on how LLM applications are performing in real-world scenarios, aiding in the curation of test cases to track regressions/improvements and the development of automatic evaluations.

    Capturing Feedback: Users can gather human feedback on the responses produced by their applications and attach feedback scores to logged traces, then filter on traces that have a specific feedback tag and score.

    Adding Runs to a Dataset: LangSmith enables the addition of runs as examples to datasets, expanding test coverage on real-world scenarios as the application progresses through the beta testing phase.

Overall, LangSmith supports testing by providing visibility, debugging tools, test creation and execution capabilities, comparison views, and environments for rapid iteration and experimentation.

构建一个能感知对话上下文的检索链

到目前为止,我们创建的链只能回答单个问题。人们正在构建的LLM应用程序的主要类型之一是聊天机器人。那么我们如何将这个链变成一个可以回答后续问题的链呢?我们仍然可以使用 create_retrieval_chain 函数,但我们需要更改两件事:

    检索方法现在不应该只适用于最近的输入,而应该考虑整个历史。最终的LLM链同样应该考虑整个历史

更新检索器

为了更新检索,我们将创建一个新的链。这个链将接收最新的输入(input)和对话历史(chat_history),并使用LLM生成搜索查询。

from langchain.chains import create_history_aware_retrieverfrom langchain_core.prompts import MessagesPlaceholder# First we need a prompt that we can pass into an LLM to generate this search queryprompt = ChatPromptTemplate.from_messages([    MessagesPlaceholder(variable_name="chat_history"),    ("user", "{input}"),    ("user", "Given the above conversation, generate a search query to look up to get information relevant to the conversation")])retriever_chain = create_history_aware_retriever(llm, retriever, prompt)

试用一下更新后检索器的效果

from langchain_core.messages import HumanMessage, AIMessagechat_history = [HumanMessage(content="Can LangSmith help test my LLM applications?"), AIMessage(content="Yes!")]retriever_chain.invoke({    "chat_history": chat_history,    "input": "Tell me how"})

如果输出没问题,那么我们生成一个新的检索链

prompt = ChatPromptTemplate.from_messages([    ("system", "Answer the user's questions based on the below context:\n\n{context}"),    MessagesPlaceholder(variable_name="chat_history"),    ("user", "{input}"),])document_chain = create_stuff_documents_chain(llm, prompt)retrieval_chain = create_retrieval_chain(retriever_chain, document_chain)

调用新的检索链

chat_history = [HumanMessage(content="Can LangSmith help test my LLM applications?"), AIMessage(content="Yes!")]retrieval_chain.invoke({    "chat_history": chat_history,    "input": "Tell me how"})

输出如下(貌似每次输出都不一样啊,这里仅供参考):

LangSmith can help test your LLM applications by providing features like creating datasets for test cases, running custom evaluations, offering comparison views for different configurations, providing a playground environment for rapid iteration and experimentation, supporting beta testing with feedback collection and annotation queues, enabling feedback scoring on logged traces, allowing annotation of traces with different criteria, adding runs to datasets for real-world scenario coverage, and offering monitoring, A/B testing, automations, and thread views for multi-turn interactions.

参考

https://python.langchain.com/docs/get_started/quickstart/#diving-deeper-1

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

langchain 检索链 LangSmith 测试帮助
相关文章