构建LangChain应用程序的示例代码：58、如何使用 Nomic 的新嵌入模型构建和部署一个检索增强生成(RAG)应用

news2025/7/12 19:02:47

Nomic 嵌入模型

Nomic 发布了一个新的嵌入模型,在长上下文检索方面表现出色(8k上下文窗口)。

本教程将介绍使用 Nomic 嵌入构建和部署(通过 LangServe)RAG 应用的过程。

在这里插入图片描述

注册

获取您的 API 令牌,然后运行:

! nomic login

然后使用您生成的 API 令牌运行

! nomic login < token >

! nomic login

! nomic login token

! pip install -U langchain-nomic langchain_community tiktoken langchain-openai chromadb langchain

# 可选: LangSmith API 密钥
import os

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["LANGCHAIN_API_KEY"] = "api_key"

文档加载

让我们测试 3 篇有趣的博客文章。

from langchain_community.document_loaders import WebBaseLoader

urls = [
    "https://lilianweng.github.io/posts/2023-06-23-agent/",
    "https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/",
    "https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/",
]

# 加载每个 URL 的内容
docs = [WebBaseLoader(url).load() for url in urls]
# 将加载的文档列表展平为单个列表
docs_list = [item for sublist in docs for item in sublist]

分割

长上下文检索

from langchain_text_splitters import CharacterTextSplitter

# 创建文本分割器,使用 tiktoken 编码器
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=7500, chunk_overlap=100
)
# 分割文档
doc_splits = text_splitter.split_documents(docs_list)

import tiktoken

# 获取 tiktoken 编码器
encoding = tiktoken.get_encoding("cl100k_base")
encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")
# 打印每个分割文档的 token 数量
for d in doc_splits:
    print("The document is %s tokens" % len(encoding.encode(d.page_content)))

索引

Nomic 嵌入参考。

import os

from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_nomic import NomicEmbeddings
from langchain_nomic.embeddings import NomicEmbeddings

# 添加到向量数据库
vectorstore = Chroma.from_documents(
    documents=doc_splits,
    collection_name="rag-chroma",
    embedding=NomicEmbeddings(model="nomic-embed-text-v1"),
)
retriever = vectorstore.as_retriever()

RAG 链

我们可以使用 Mistral v0.2,它是针对 32k 上下文微调的。

我们可以使用 Ollama -

ollama pull mistral:instruct

我们也可以运行 GPT-4 128k。

from langchain_community.chat_models import ChatOllama
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

# 提示模板
template = """根据以下上下文回答问题:
{context}

问题: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

# LLM API
model = ChatOpenAI(temperature=0, model="gpt-4-1106-preview")

# 本地 LLM
ollama_llm = "mistral:instruct"
model_local = ChatOllama(model=ollama_llm)

# 链
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model_local
    | StrOutputParser()
)

# 问题
chain.invoke("What are the types of agent memory?")

Mistral

追踪: 24k 提示 tokens。

https://smith.langchain.com/public/3e04d475-ea08-4ee3-ae66-6416a93d8b08/r

大海捞针分析中注意到一些考虑因素:

LLM 可能在大上下文检索中遇到困难,取决于信息的放置位置。

LangServe

创建一个 LangServe 应用。

在这里插入图片描述

$ conda create -n template-testing-env python=3.11
$ conda activate template-testing-env
$ pip install -U "langchain-cli[serve]" "langserve[all]"
$ langchain app new .
$ poetry add langchain-nomic langchain_community tiktoken langchain-openai chromadb langchain
$ poetry install

将上述逻辑添加到新文件 chain.py。

添加到 server.py -

from app.chain import chain as nomic_chain
add_routes(app, nomic_chain, path="/nomic-rag")

运行 -

$ poetry run langchain serve

总结

本文档介绍了如何使用 Nomic 的新嵌入模型构建和部署一个检索增强生成(RAG)应用。主要内容包括:

Nomic 嵌入模型的介绍
设置和登录 Nomic API
加载和分割文档
使用 Nomic 嵌入创建向量索引
构建 RAG 链,包括检索器和语言模型
使用 LangServe 部署 RAG 应用

文档还讨论了长上下文检索的考虑因素,以及使用不同的语言模型选项(如 Mistral 和 GPT-4)。

扩展知识

嵌入模型: 嵌入模型将文本转换为密集向量,捕捉语义信息。Nomic 的新模型针对长上下文(8k tokens)进行了优化。
RAG (检索增强生成): RAG 是一种结合检索系统和生成模型的方法,用于提高大语言模型的性能和准确性。
LangChain: 一个用于构建基于语言模型的应用的框架,提供了多种工具和抽象。
LangServe: LangChain 的一个组件,用于将 LangChain 应用部署为 API。
Mistral 和 GPT-4: 两种不同的大语言模型。Mistral 是一个开源模型,可以本地运行,而 GPT-4 是 OpenAI 的最新模型,具有非常大的上下文窗口(高达 128k tokens)。
向量存储: 用于存储和检索向量嵌入的专用数据库。本例中使用了 Chroma。
长上下文检索的挑战: 随着上下文长度增加,检索相关信息变得更加困难,可能需要特殊的技术来确保模型能够有效利用大量的上下文信息。