Creating a Medical Question-Answering Chatbot Using Open-Source BioMistral LLM, LangChain, Chroma’s Vector Storage, and RAG: A Step-by-Step Guide

In this tutorial, we’ll build a powerful, PDF-based question-answering chatbot tailored for medical or health-related content. We’ll leveRAGe the open-source BioMistral LLM and LangChain’s flexible data orchestration capabilities to process PDF documents into manageable text chunks. We’ll then encode these chunks using Hugging Face embeddings, capturing deep semantic relationships and storing them in a Chroma vector database for high-efficiency retrieval. Finally, by employing a Retrieval-Augmented Generation (RAG) system, we’ll integrate the retrieved context directly into our chatbot’s responses, ensuring clear, authoritative answers for users. This approach allows us to rapidly sift through large volumes of medical PDFs, providing context-rich, accurate, and easy-to-understand insights.

Setting up tools

Copy CodeCopiedUse a different Browser

!pip install langchain sentence-transformers chromadb llama-cpp-python langchain_community pypdffrom langchain_community.document_loaders import PyPDFDirectoryLoaderfrom langchain.text_splitter import CharacterTextSplitter,RecursiveCharacterTextSplitterfrom langchain_community.embeddings import HuggingFaceEmbeddingsfrom langchain.vectorstores import FAISS, Chromafrom langchain_community.llms import LlamaCppfrom langchain.chains import RetrievalQA, LLMChainimport pathlibimport textwrapfrom IPython.display import displayfrom IPython.display import Markdowndef tomarkdown(text):    text = text.replace('•', '  *')    return Markdown(textwrap.indent(text, '> ', predicate=lambda : True))from google.colab import drivedrive.mount('/content/drive')

First, we install and configure Python packages for document processing, embedding generation, local LLMs, and advanced retrieval-based workflows with LlamaCpp. We leverage langchain_community for PDF loading and text splitting, set up RetrievalQA and LLMChain for question answering, and include a to_markdown utility plus Google Drive mounting.

Setting up API key access

Copy CodeCopiedUse a different Browser

from google.colab import userdata# Or use os.getenv('HUGGINGFACEHUB_API_TOKEN') to fetch an environment variable.import osfrom getpass import getpassHF_API_KEY = userdata.get("HF_API_KEY")os.environ["HF_API_KEY"] = "HF_API_KEY"

Here, we securely fetch and set the Hugging Face API key as an environment variable in Google Colab. It can also leverage the HUGGINGFACEHUB_API_TOKEN environment variable to avoid directly exposing sensitive credentials in your code.

Loading and Extracting PDFs from a Directory

Copy CodeCopiedUse a different Browser

loader = PyPDFDirectoryLoader('/content/drive/My Drive/Data')docs = loader.load()

We use PyPDFDirectoryLoader to scan the specified folder for PDFs, extract their text into a document list, and lay the groundwork for tasks like question answering, summarization, or keyword extraction.

Splitting Loaded Text Documents into Manageable Chunks

Copy CodeCopiedUse a different Browser

text_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)chunks = text_splitter.split_documents(docs)

In this code snippet, RecursiveCharacterTextSplitter is applied to break down each document in docs into smaller, more manageable segments.

Initializing Hugging Face Embeddings

Copy CodeCopiedUse a different Browser

embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-base-en-v1.5")

Using HuggingFaceEmbeddings, we create an object using the BAAI/bge-base-en-v1.5 model. It converts text into numerical vectors.

Building a Vector Store and Running a Similarity Search

Copy CodeCopiedUse a different Browser

vectorstore = Chroma.from_documents(chunks, embeddings)query = "who is at risk of heart disease"search = vectorstore.similarity_search(query)to_markdown(search[0].page_content)

We first build a Chroma vector store (Chroma.from_documents) from the text chunks and the specified embedding model. Next, you create a query asking, “who is at risk of heart disease,” and perform a similarity search against the stored embeddings. The top result (search[0].page_content) is then converted to Markdown for clearer display.

Creating a Retriever and Fetching Relevant Documents

Copy CodeCopiedUse a different Browser

retriever = vectorstore.as_retriever(    search_kwargs={'k': 5})retriever.get_relevant_documents(query)

We convert the Chroma vector store into a retriever (vectorstore.as_retriever) that efficiently fetches the most relevant documents for a given query.

Initializing BioMistral-7B Model with LlamaCpp

Copy CodeCopiedUse a different Browser

llm = LlamaCpp(    model_path= "/content/drive/MyDrive/Model/BioMistral-7B.Q4_K_M.gguf",    temperature=0.3,    max_tokens=2048,    top_p=1)

We set up an open-source local BioMistral LLM using LlamaCpp, pointing to a pre-downloaded model file. We also configure generation parameters such as temperature, max_tokens, and top_p, which control randomness, the maximum tokens generated, and the nucleus sampling strategy.

Setting Up a Retrieval-Augmented Generation (RAG) Chain with a Custom Prompt

Copy CodeCopiedUse a different Browser

from langchain.schema.runnable import RunnablePassthroughfrom langchain.schema.output_parser import StrOutputParserfrom langchain.prompts import ChatPromptTemplatetemplate = """<|context|>You are an AI assistant that follows instruction extremely well.Please be truthful and give direct answers</s><|user|>{query}</s> <|assistant|>"""prompt = ChatPromptTemplate.from_template(template)rag_chain = (    {'context': retriever, 'query': RunnablePassthrough()}    | prompt    | llm    | StrOutputParser())

Using the above, we set up an RAG pipeline using the LangChain framework. It creates a custom prompt with instructions and placeholders, incorporates a retriever for context, and leverages a language model for generating answers. The flow is defined as a series of operations (RunnablePassthrough for direct query handling, the ChatPromptTemplate for prompt construction, the LLM for response generation, and finally, the StrOutputParser to produce a clean text string).

Invoking the RAG Chain to Answer a Health-Related Query

Copy CodeCopiedUse a different Browser

response = rag_chain.invoke("Why should I care about my heart health?")to_markdown(response)

Now, we call the previously constructed RAG chain with a user’s query. It passes the query to the retriever, retrieves relevant context from the document collection, and feeds that context into the LLM to generate a concise, accurate answer.

In conclusion, by integrating BioMistral via LlamaCpp and taking advantage of LangChain’s flexibility, we are able to build a medical-RAG chatbot with context awareness. From chunk-based indexing to seamless RAG pipelines, it streamlines the process of mining large volumes of PDF data for relevant insights. Users receive clear and easily readable answers by formatting final responses in Markdown. This design can be extended or tailored for various domains, ensuring scalability and precision in knowledge retrieval across diverse documents.

Use the Colab Notebook here. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 75k+ ML SubReddit.

Meet IntellAgent: An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System ^(Promoted)

The post Creating a Medical Question-Answering Chatbot Using Open-Source BioMistral LLM, LangChain, Chroma’s Vector Storage, and RAG: A Step-by-Step Guide appeared first on MarkTechPost.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签