MarkTechPost@AI 05月14日 15:20
A Step-by-Step Guide to Build a Fast Semantic Search and RAG QA Engine on Web-Scraped Data Using Together AI Embeddings, FAISS Retrieval, and LangChain
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了如何利用Together AI的生态系统,快速搭建一个基于网络抓取数据的问答服务。通过抓取网页,分割成块,使用togethercomputer/m2-bert-80M-8k-retrieval模型进行嵌入,存储在FAISS索引中实现快速相似性搜索。然后,使用ChatTogether模型根据检索到的内容生成答案。整个过程避免了管理多个供应商和API密钥的麻烦,利用Langchain框架,仅用约50行代码即可实现 ingest, embed, store, retrieve, and converse等步骤,构建完整的RAG流程。

🛠️ 使用Together AI,通过抓取网页内容,并利用RecursiveCharacterTextSplitter将网页分割成约800字符的小块,重叠100字符,为后续高质量的嵌入做准备。

🧠 使用Together AI的80M参数m2-bert检索模型作为LangChain嵌入器,并将每个文本块输入其中,同时FAISS构建一个内存向量索引,从而实现毫秒级的余弦搜索,将抓取的页面转化为可搜索的语义数据库。

💬 通过LangChain的RetrievalQA,将FAISS检索器(返回最相似的4个文本块)和ChatTogether模型结合,使用简单的“stuff”提示模板,生成简洁的答案,并返回答案所依赖的原始文本段落及其来源URL,实现透明的引用。

🔗 整个流程具有模块化特性,可以灵活替换FAISS为Chroma,或者替换嵌入模型,甚至插入reranker,而无需改动pipeline的其他部分,这得益于Together AI统一的后端。

In this tutorial, we lean hard on Together AI’s growing ecosystem to show how quickly we can turn unstructured text into a question-answering service that cites its sources. We’ll scrape a handful of live web pages, slice them into coherent chunks, and feed those chunks to the togethercomputer/m2-bert-80M-8k-retrieval embedding model. Those vectors land in a FAISS index for millisecond similarity search, after which a lightweight ChatTogether model drafts answers that stay grounded in the retrieved passages. Because Together AI handles embeddings and chat behind a single API key, we avoid juggling multiple providers, quotas, or SDK dialects.

!pip -q install --upgrade langchain-core langchain-community langchain-together faiss-cpu tiktoken beautifulsoup4 html2text

This quiet (-q) pip command upgrades and installs everything the Colab RAG needs. It pulls core LangChain libraries plus the Together AI integration, FAISS for vector search, token-handling with tiktoken, and lightweight HTML parsing via beautifulsoup4 and html2text, ensuring the notebook runs end-to-end without additional setup.

import os, getpass, warnings, textwrap, jsonif "TOGETHER_API_KEY" not in os.environ:    os.environ["TOGETHER_API_KEY"] = getpass.getpass(" Enter your Together API key: ")

We check whether the TOGETHER_API_KEY environment variable is already set; if not, it securely prompts us for the key with getpass and stores it in os.environ. The rest of the notebook can call Together AI’s API without hard‑coding secrets or exposing them in plain text by capturing the credentials once per runtime.

from langchain_community.document_loaders import WebBaseLoaderURLS = [    "https://python.langchain.com/docs/integrations/text_embedding/together/",    "https://api.together.xyz/",    "https://together.ai/blog"  ]raw_docs = WebBaseLoader(URLS).load()

WebBaseLoader fetches each URL, strips boilerplate, and returns LangChain Document objects containing the clean page text plus metadata. By passing a list of Together-related links, we immediately collect live documentation and blog content that will later be chunked and embedded for semantic search.

from langchain.text_splitter import RecursiveCharacterTextSplittersplitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=100)docs = splitter.split_documents(raw_docs)print(f"Loaded {len(raw_docs)} pages → {len(docs)} chunks after splitting.")

RecursiveCharacterTextSplitter slices every fetched page into ~800-character segments with a 100-character overlap so contextual clues aren’t lost at chunk boundaries. The resulting list docs holds these bite-sized LangChain Document objects, and the printout shows how many chunks were produced from the original pages, essential prep for high-quality embedding.

from langchain_together.embeddings import TogetherEmbeddingsembeddings = TogetherEmbeddings(    model="togethercomputer/m2-bert-80M-8k-retrieval"  )from langchain_community.vectorstores import FAISSvector_store = FAISS.from_documents(docs, embeddings)

Here we instantiate Together AI’s 80 M-parameter m2-bert retrieval model as a drop-in LangChain embedder, then feed every text chunk into it while FAISS.from_documents builds an in-memory vector index. The resulting vector store supports millisecond-level cosine searches, turning our scraped pages into a searchable semantic database.

from langchain_together.chat_models import ChatTogetherllm = ChatTogether(    model="mistralai/Mistral-7B-Instruct-v0.3",            temperature=0.2,    max_tokens=512,)

ChatTogether wraps a chat-tuned model hosted on Together AI, Mistral-7B-Instruct-v0.3 to be used like any other LangChain LLM. A low temperature of 0.2 keeps answers grounded and repeatable, while max_tokens=512 leaves room for detailed, multi-paragraph responses without runaway cost.

from langchain.chains import RetrievalQAqa_chain = RetrievalQA.from_chain_type(    llm=llm,    chain_type="stuff",    retriever=vector_store.as_retriever(search_kwargs={"k": 4}),    return_source_documents=True,)

RetrievalQA stitches the pieces together: it takes our FAISS retriever (returning the top 4 similar chunks) and feeds those snippets into the llm using the simple “stuff” prompt template. Setting return_source_documents=True means each answer will return with the exact passages it relied on, giving us instant, citation-ready Q-and-A.

QUESTION = "How do I use TogetherEmbeddings inside LangChain, and what model name should I pass?"result = qa_chain(QUESTION)print("n Answer:n", textwrap.fill(result['result'], 100))print("n Sources:")for doc in result['source_documents']:    print(" •", doc.metadata['source'])

Finally, we send a natural-language query through the qa_chain, which retrieves the four most relevant chunks, feeds them to the ChatTogether model, and returns a concise answer. It then prints the formatted response, followed by a list of source URLs, giving us both the synthesized explanation and transparent citations in one shot.

Output from the Final Cell

In conclusion, in roughly fifty lines of code, we built a complete RAG loop powered end-to-end by Together AI: ingest, embed, store, retrieve, and converse. The approach is deliberately modular, swap FAISS for Chroma, trade the 80 M-parameter embedder for Together’s larger multilingual model, or plug in a reranker without touching the rest of the pipeline. What remains constant is the convenience of a unified Together AI backend: fast, affordable embeddings, chat models tuned for instruction following, and a generous free tier that makes experimentation painless. Use this template to bootstrap an internal knowledge assistant, a documentation bot for customers, or a personal research aide.


Check out the Colab Notebook here. Also, feel free to follow us on Twitter and don’t forget to join our 90k+ ML SubReddit.

The post A Step-by-Step Guide to Build a Fast Semantic Search and RAG QA Engine on Web-Scraped Data Using Together AI Embeddings, FAISS Retrieval, and LangChain appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Together AI RAG LangChain FAISS 语义搜索
相关文章