LangChain: Reduce size of tokens being passed to OpenAI

news2025/4/13 5:19:53

题意：在使用 LangChain时，需要减少传递给OpenAI的令牌（tokens）的数量

问题背景：

I am using LangChain to create embeddings and then ask a question to those embeddings like so:

我正在使用 LangChain 来创建嵌入表示（embeddings），然后像这样向这些嵌入表示提出问题：

embeddings: OpenAIEmbeddings = OpenAIEmbeddings(disallowed_special=())
db = DeepLake(
    dataset_path=deeplake_url,
    read_only=True,
    embedding_function=embeddings,
)
retriever: VectorStoreRetriever = db.as_retriever()
model = ChatOpenAI(model_name="gpt-3.5-turbo") 
qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)
result = qa({"question": question, "chat_history": chat_history})

But I am getting the following error:

但是我得到了以下的错误：

File "/xxxxx/openai/api_requestor.py", line 763, in _interpret_response_line
    raise self.handle_error_response(
openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 13918 tokens. Please reduce the length of the messages.

The chat_history is empty and the question is quite small.

“聊天历史是空的，而且问题相当简短。”

How can I reduce the size of tokens being passed to OpenAI?

我如何减少传递给OpenAI的令牌数量？

I'm assuming the response from the embeddings is too large being passed to openai. It might be easy enough to just figure out how to truncate the data being sent to openai.

我假设传递给OpenAI的嵌入表示响应太大。可能很容易就能找到如何截断发送给OpenAI的数据的方法。

问题解决：

Summary 概述

When you initiate the ConversationalRetrievalChain object, pass in a max_tokens_limit amount.

当你初始化ConversationalRetrievalChain对象时，传入一个max_tokens_limit的数量

qa = ConversationalRetrievalChain.from_llm(
        model, retriever=retriever, max_tokens_limit=4000
    )

This will automatically truncate the tokens when asking openai / your llm.

这将在向OpenAI/你的大语言模型（LLM）提问时自动截断令牌。

Longer explainer 更详细的说明

In the base.py of ConversationalRetrievalChain there is a function that is called when asking your question to deeplake/openai:

在ConversationalRetrievalChain的base.py文件中，有一个函数，当你向deeplake/openai提问时会被调用。

    def _get_docs(self, question: str, inputs: Dict[str, Any]) -> List[Document]:
        docs = self.retriever.get_relevant_documents(question)
        return self._reduce_tokens_below_limit(docs)

Which reads from the deeplake vector database, and adds that as context to your doc's text that you upload to openai.

它从deeplake向量数据库中读取数据，并将这些数据作为上下文添加到你上传到openai的文档文本中。

The _reduce_tokens_below_limit reads from the class instance variable max_tokens_limit to truncate the size of the input docs.

_reduce_tokens_below_limit 函数从类实例变量 max_tokens_limit 中读取值，以截断输入文档的大小。