MarkTechPost@AI 05月18日 11:20
How to Build a Powerful and Intelligent Question-Answering System by Using Tavily Search API, Chroma, Google Gemini LLMs, and the LangChain Framework
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍如何利用LangChain框架、Tavily搜索API、Chroma数据库和Google Gemini LLMs构建一个强大的问答系统。该系统结合了Tavily的实时网络搜索、Chroma的语义文档缓存以及Gemini模型提供的上下文响应生成能力。通过LangChain的模块化组件,如RunnableLambda、ChatPromptTemplate和ConversationBufferMemory,实现了这些工具的集成。此外,该系统引入了混合检索机制,优先检查缓存的嵌入,然后执行新的网络搜索。检索到的文档经过智能格式化、总结,并传递给结构化的LLM提示,同时关注来源归属、用户历史和置信度评分。此管道适用于研究辅助、特定领域的摘要和智能代理等高级用例。

🌐 **实时网络搜索集成**: 该系统利用Tavily Search API进行实时网络搜索,确保获取最新的信息,并结合Chroma数据库进行语义文档缓存,提高信息检索的效率。

🧠 **混合检索机制**: 系统引入了混合检索机制,优先在Chroma数据库中查找缓存的嵌入,只有在缓存中未找到相关信息时,才会调用Tavily进行新的网络搜索,从而优化了信息检索的流程。

🗣️ **智能文档处理**: 检索到的文档经过智能格式化和总结,并传递给结构化的LLM提示,同时关注来源归属、用户历史和置信度评分,从而提高了输出结果的质量和可信度。

🛠️ **LangChain框架的应用**: 通过LangChain的模块化组件(如RunnableLambda、ChatPromptTemplate和ConversationBufferMemory),实现了工具的灵活集成,使得系统更易于扩展和维护。

In this tutorial, we demonstrate how to build a powerful and intelligent question-answering system by combining the strengths of Tavily Search API, Chroma, Google Gemini LLMs, and the LangChain framework. The pipeline leverages real-time web search using Tavily, semantic document caching with Chroma vector store, and contextual response generation through the Gemini model. These tools are integrated through LangChain’s modular components, such as RunnableLambda, ChatPromptTemplate, ConversationBufferMemory, and GoogleGenerativeAIEmbeddings. It goes beyond simple Q&A by introducing a hybrid retrieval mechanism that checks for cached embeddings before invoking fresh web searches. The retrieved documents are intelligently formatted, summarized, and passed through a structured LLM prompt, with attention to source attribution, user history, and confidence scoring. Key functions such as advanced prompt engineering, sentiment and entity analysis, and dynamic vector store updates make this pipeline suitable for advanced use cases like research assistance, domain-specific summarization, and intelligent agents.

!pip install -qU langchain-community tavily-python langchain-google-genai streamlit matplotlib pandas tiktoken chromadb langchain_core pydantic langchain

We install and upgrade a comprehensive set of libraries required to build an advanced AI search assistant. It includes tools for retrieval (tavily-python, chromadb), LLM integration (langchain-google-genai, langchain), data handling (pandas, pydantic), visualization (matplotlib, streamlit), and tokenization (tiktoken). These components form the core foundation for constructing a real-time, context-aware QA system.

import osimport getpassimport pandas as pdimport matplotlib.pyplot as pltimport numpy as npimport jsonimport timefrom typing import List, Dict, Any, Optionalfrom datetime import datetime

We import essential Python libraries used throughout the notebook. It includes standard libraries for environment variables, secure input, time tracking, and data types (os, getpass, time, typing, datetime). Additionally, it brings in core data science tools like pandas, matplotlib, and numpy for data handling, visualization, and numerical computations, as well as json for parsing structured data.

if "TAVILY_API_KEY" not in os.environ:    os.environ["TAVILY_API_KEY"] = getpass.getpass("Enter Tavily API key: ")   if "GOOGLE_API_KEY" not in os.environ:    os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter Google API key: ")import logginglogging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')logger = logging.getLogger(__name__)

We securely initialize API keys for Tavily and Google Gemini by prompting users only if they’re not already set in the environment, ensuring safe and repeatable access to external services. It also configures a standardized logging setup using Python’s logging module, which helps monitor execution flow and capture debug or error messages throughout the notebook.

from langchain_community.retrievers import TavilySearchAPIRetrieverfrom langchain_community.vectorstores import Chromafrom langchain_core.documents import Documentfrom langchain_core.output_parsers import StrOutputParser, JsonOutputParserfrom langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplatefrom langchain_core.runnables import RunnablePassthrough, RunnableLambdafrom langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddingsfrom langchain.text_splitter import RecursiveCharacterTextSplitterfrom langchain.chains.summarize import load_summarize_chainfrom langchain.memory import ConversationBufferMemory

We import key components from the LangChain ecosystem and its integrations. It brings in the TavilySearchAPIRetriever for real-time web search, Chroma for vector storage, and GoogleGenerativeAI modules for chat and embedding models. Core LangChain modules like ChatPromptTemplate, RunnableLambda, ConversationBufferMemory, and output parsers enable flexible prompt construction, memory handling, and pipeline execution.

class SearchQueryError(Exception):    """Exception raised for errors in the search query."""    passdef format_docs(docs):    formatted_content = []    for i, doc in enumerate(docs):        metadata = doc.metadata        source = metadata.get('source', 'Unknown source')        title = metadata.get('title', 'Untitled')        score = metadata.get('score', 0)               formatted_content.append(            f"Document {i+1} [Score: {score:.2f}]:n"            f"Title: {title}n"            f"Source: {source}n"            f"Content: {doc.page_content}n"        )       return "nn".join(formatted_content)

We define two essential components for search and document handling. The SearchQueryError class creates a custom exception to manage invalid or failed search queries gracefully. The format_docs function processes a list of retrieved documents by extracting metadata such as title, source, and relevance score and formatting them into a clean, readable string.

class SearchResultsParser:    def parse(self, text):        try:            if isinstance(text, str):                import re                import json                json_match = re.search(r'{.*}', text, re.DOTALL)                if json_match:                    json_str = json_match.group(0)                    return json.loads(json_str)                return {"answer": text, "sources": [], "confidence": 0.5}            elif hasattr(text, 'content'):                return {"answer": text.content, "sources": [], "confidence": 0.5}            else:                return {"answer": str(text), "sources": [], "confidence": 0.5}        except Exception as e:            logger.warning(f"Failed to parse JSON: {e}")            return {"answer": str(text), "sources": [], "confidence": 0.5}

The SearchResultsParser class provides a robust method for extracting structured information from LLM responses. It attempts to parse a JSON-like string from the model output, returning to a plain text response format if parsing fails. It gracefully handles string outputs and message objects, ensuring consistent downstream processing. In case of errors, it logs a warning and returns a fallback response containing the raw answer, empty sources, and a default confidence score, enhancing the system’s fault tolerance.

class EnhancedTavilyRetriever:    def __init__(self, api_key=None, max_results=5, search_depth="advanced", include_domains=None, exclude_domains=None):        self.api_key = api_key        self.max_results = max_results        self.search_depth = search_depth        self.include_domains = include_domains or []        self.exclude_domains = exclude_domains or []        self.retriever = self._create_retriever()        self.previous_searches = []           def _create_retriever(self):        try:            return TavilySearchAPIRetriever(                api_key=self.api_key,                k=self.max_results,                search_depth=self.search_depth,                include_domains=self.include_domains,                exclude_domains=self.exclude_domains            )        except Exception as e:            logger.error(f"Failed to create Tavily retriever: {e}")            raise       def invoke(self, query, **kwargs):        if not query or not query.strip():            raise SearchQueryError("Empty search query")               try:            start_time = time.time()            results = self.retriever.invoke(query, **kwargs)            end_time = time.time()                       search_record = {                "timestamp": datetime.now().isoformat(),                "query": query,                "num_results": len(results),                "response_time": end_time - start_time            }            self.previous_searches.append(search_record)                       return results        except Exception as e:            logger.error(f"Search failed: {e}")            raise SearchQueryError(f"Failed to perform search: {str(e)}")       def get_search_history(self):        return self.previous_searches

The EnhancedTavilyRetriever class is a custom wrapper around the TavilySearchAPIRetriever, adding greater flexibility, control, and traceability to search operations. It supports advanced features like limiting search depth, domain inclusion/exclusion filters, and configurable result counts. The invoke method performs web searches and tracks each query’s metadata (timestamp, response time, and result count), storing it for later analysis.

class SearchCache:    def __init__(self):        self.embedding_function = GoogleGenerativeAIEmbeddings(model="models/embedding-001")        self.vector_store = None        self.text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)           def add_documents(self, documents):        if not documents:            return               try:            if self.vector_store is None:                self.vector_store = Chroma.from_documents(                    documents=documents,                    embedding=self.embedding_function                )            else:                self.vector_store.add_documents(documents)        except Exception as e:            logger.error(f"Failed to add documents to cache: {e}")       def search(self, query, k=3):        if self.vector_store is None:            return []               try:            return self.vector_store.similarity_search(query, k=k)        except Exception as e:            logger.error(f"Vector search failed: {e}")            return []

The SearchCache class implements a semantic caching layer that stores and retrieves documents using vector embeddings for efficient similarity search. It uses GoogleGenerativeAIEmbeddings to convert documents into dense vectors and stores them in a Chroma vector database. The add_documents method initializes or updates the vector store, while the search method enables fast retrieval of the most relevant cached documents based on semantic similarity. This reduces redundant API calls and improves response times for repeated or related queries, serving as a lightweight hybrid memory layer in the AI assistant pipeline.

search_cache = SearchCache()enhanced_retriever = EnhancedTavilyRetriever(max_results=5)memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)system_template = """You are a research assistant that provides accurate answers based on the search results provided.Follow these guidelines:1. Only use the context provided to answer the question2. If the context doesn't contain the answer, say "I don't have sufficient information to answer this question."3. Cite your sources by referencing the document numbers4. Don't make up information5. Keep the answer concise but completeContext: {context}Chat History: {chat_history}"""system_message = SystemMessagePromptTemplate.from_template(system_template)human_template = "Question: {question}"human_message = HumanMessagePromptTemplate.from_template(human_template)prompt = ChatPromptTemplate.from_messages([system_message, human_message])

We initialize the core components of the AI assistant: a semantic SearchCache, the EnhancedTavilyRetriever for web-based querying, and a ConversationBufferMemory to retain chat history across turns. It also defines a structured prompt using ChatPromptTemplate, guiding the LLM to act as a research assistant. The prompt enforces strict rules for factual accuracy, context usage, source citation, and concise answering, ensuring reliable and grounded responses.

def get_llm(model_name="gemini-2.0-flash-lite", temperature=0.2, response_mode="json"):    try:        return ChatGoogleGenerativeAI(            model=model_name,            temperature=temperature,            convert_system_message_to_human=True,            top_p=0.95,            top_k=40,            max_output_tokens=2048        )    except Exception as e:        logger.error(f"Failed to initialize LLM: {e}")        raiseoutput_parser = SearchResultsParser()

We define the get_llm function, which initializes a Google Gemini language model with configurable parameters such as model name, temperature, and decoding settings (e.g., top_p, top_k, and max tokens). It ensures robustness with error handling for failed model initialization. An instance of SearchResultsParser is also created to standardize and structure the LLM’s raw responses, enabling consistent downstream processing of answers and metadata.

def plot_search_metrics(search_history):    if not search_history:        print("No search history available")        return       df = pd.DataFrame(search_history)       plt.figure(figsize=(12, 6))    plt.subplot(1, 2, 1)    plt.plot(range(len(df)), df['response_time'], marker='o')    plt.title('Search Response Times')    plt.xlabel('Search Index')    plt.ylabel('Time (seconds)')    plt.grid(True)       plt.subplot(1, 2, 2)    plt.bar(range(len(df)), df['num_results'])    plt.title('Number of Results per Search')    plt.xlabel('Search Index')    plt.ylabel('Number of Results')    plt.grid(True)       plt.tight_layout()    plt.show()

The plot_search_metrics function visualizes performance trends from past queries using Matplotlib. It converts the search history into a DataFrame and plots two subgraphs: one showing response time per search and the other displaying the number of results returned. This aids in analyzing the system’s efficiency and search quality over time, helping developers fine-tune the retriever or identify bottlenecks in real-world usage.

def retrieve_with_fallback(query):    cached_results = search_cache.search(query)       if cached_results:        logger.info(f"Retrieved {len(cached_results)} documents from cache")        return cached_results       logger.info("No cache hit, performing web search")    search_results = enhanced_retriever.invoke(query)       search_cache.add_documents(search_results)       return search_resultsdef summarize_documents(documents, query):    llm = get_llm(temperature=0)       summarize_prompt = ChatPromptTemplate.from_template(        """Create a concise summary of the following documents related to this query: {query}               {documents}               Provide a comprehensive summary that addresses the key points relevant to the query.        """    )       chain = (        {"documents": lambda docs: format_docs(docs), "query": lambda _: query}        | summarize_prompt        | llm        | StrOutputParser()    )       return chain.invoke(documents)

These two functions enhance the assistant’s intelligence and efficiency. The retrieve_with_fallback function implements a hybrid retrieval mechanism: it first attempts to fetch semantically relevant documents from the local Chroma cache and, if unsuccessful, falls back to a real-time Tavily web search, caching the new results for future use. Meanwhile, summarize_documents leverages a Gemini LLM to generate concise summaries from retrieved documents, guided by a structured prompt that ensures relevance to the query. Together, they enable low-latency, informative, and context-aware responses.

def advanced_chain(query_engine="enhanced", model="gemini-1.5-pro", include_history=True):    llm = get_llm(model_name=model)       if query_engine == "enhanced":        retriever = lambda query: retrieve_with_fallback(query)    else:        retriever = enhanced_retriever.invoke       def chain_with_history(input_dict):        query = input_dict["question"]        chat_history = memory.load_memory_variables({})["chat_history"] if include_history else []               docs = retriever(query)               context = format_docs(docs)               result = prompt.invoke({            "context": context,            "question": query,            "chat_history": chat_history        })               memory.save_context({"input": query}, {"output": result.content})               return llm.invoke(result)       return RunnableLambda(chain_with_history) | StrOutputParser()

The advanced_chain function defines a modular, end-to-end reasoning workflow for answering user queries using cached or real-time search. It initializes the specified Gemini model, selects the retrieval strategy (cached fallback or direct search), constructs a response pipeline incorporating chat history (if enabled), formats documents into context, and prompts the LLM using a system-guided template. The chain also logs the interaction in memory and returns the final answer, parsed into clean text. This design enables flexible experimentation with models and retrieval strategies while maintaining conversation coherence.

qa_chain = advanced_chain()def analyze_query(query):    llm = get_llm(temperature=0)       analysis_prompt = ChatPromptTemplate.from_template(        """Analyze the following query and provide:        1. Main topic        2. Sentiment (positive, negative, neutral)        3. Key entities mentioned        4. Query type (factual, opinion, how-to, etc.)               Query: {query}               Return the analysis in JSON format with the following structure:        {{            "topic": "main topic",            "sentiment": "sentiment",            "entities": ["entity1", "entity2"],            "type": "query type"        }}        """    )       chain = analysis_prompt | llm | output_parser       return chain.invoke({"query": query})print("Advanced Tavily-Gemini Implementation")print("="*50)query = "what year was breath of the wild released and what was its reception?"print(f"Query: {query}")

We initialize the final components of the intelligent assistant. qa_chain is the assembled reasoning pipeline ready to process user queries using retrieval, memory, and Gemini-based response generation. The analyze_query function performs a lightweight semantic analysis on a query, extracting the main topic, sentiment, entities, and query type using the Gemini model and a structured JSON prompt. The example query, about Breath of the Wild’s release and reception, showcases how the assistant is triggered and prepared for full-stack inference and semantic interpretation. The printed heading marks the start of interactive execution.

try:    print("nSearching for answer...")    answer = qa_chain.invoke({"question": query})    print("nAnswer:")    print(answer)       print("nAnalyzing query...")    try:        query_analysis = analyze_query(query)        print("nQuery Analysis:")        print(json.dumps(query_analysis, indent=2))    except Exception as e:        print(f"Query analysis error (non-critical): {e}")except Exception as e:    print(f"Error in search: {e}")history = enhanced_retriever.get_search_history()print("nSearch History:")for i, h in enumerate(history):    print(f"{i+1}. Query: {h['query']} - Results: {h['num_results']} - Time: {h['response_time']:.2f}s")print("nAdvanced search with domain filtering:")specialized_retriever = EnhancedTavilyRetriever(    max_results=3,    search_depth="advanced",    include_domains=["nintendo.com", "zelda.com"],    exclude_domains=["reddit.com", "twitter.com"])try:    specialized_results = specialized_retriever.invoke("breath of the wild sales")    print(f"Found {len(specialized_results)} specialized results")       summary = summarize_documents(specialized_results, "breath of the wild sales")    print("nSummary of specialized results:")    print(summary)except Exception as e:    print(f"Error in specialized search: {e}")print("nSearch Metrics:")plot_search_metrics(history)

We demonstrate the complete pipeline in action. It performs a search using the qa_chain, displays the generated answer, and then analyzes the query for sentiment, topic, entities, and type. It also retrieves and prints each query’s search history, response time, and result count. Also, it runs a domain-filtered search focused on Nintendo-related sites, summarizes the results, and visualizes search performance using plot_search_metrics, offering a comprehensive view of the assistant’s capabilities in real-time use.

In conclusion, following this tutorial gives users a comprehensive blueprint for creating a highly capable, context-aware, and scalable RAG system that bridges real-time web intelligence with conversational AI. The Tavily Search API lets users directly pull fresh and relevant content from the web. The Gemini LLM adds robust reasoning and summarization capabilities, while LangChain’s abstraction layer allows seamless orchestration between memory, embeddings, and model outputs. The implementation includes advanced features such as domain-specific filtering, query analysis (sentiment, topic, and entity extraction), and fallback strategies using a semantic vector cache built with Chroma and GoogleGenerativeAIEmbeddings. Also, structured logging, error handling, and analytics dashboards provide transparency and diagnostics for real-world deployment.


Check out the Colab Notebook. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 90k+ ML SubReddit.

The post How to Build a Powerful and Intelligent Question-Answering System by Using Tavily Search API, Chroma, Google Gemini LLMs, and the LangChain Framework appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LangChain Google Gemini Tavily 问答系统
相关文章