问题解决：Problem exceeding maximum token in azure openai (with java)

news2025/4/16 21:11:55

问题背景：

I'm doing a chat that returns queries based on the question you ask it in reference to a specific database. For this I use azure openai and Java in Spring Boot.

我正在开发一个聊天功能，该功能根据您针对特定数据库的提问返回查询结果。为此，我使用了Azure OpenAI和Spring Boot中的Java。

My problem comes here:

How can I make the AI remember the previous questions without passing the context back to it (what I want to do is greatly reduce the consumption of tokens, since depending on what it asks, if the question contains a keyword, for example 'users', what I do is pass in the context the information in this table that is huge (name of the fields, type of data and description) so when you have several questions the use of tokens rises to more than 10,000))

我如何能让AI记住之前的问题，而不需要将上下文再次传递给它（我想做的是大大减少令牌的消耗，因为根据AI提出的问题，如果问题中包含一个关键字，例如“用户”，我会在上下文中传递这个巨大表格的信息（字段名、数据类型和描述），所以当你有多个问题时，令牌的使用量会上升到超过10,000个）)

I can't show all the code since it's a project for my company.

由于这是我们公司的一个项目，我不能展示所有的代码。

What im currently doing is adding to the context the referenced table and the principal context(you are a based SQL chat...). And for the chat to remember, I have tried to save the history in java and pass the context history again(but this exceed the tokens pretty fast)

我目前所做的是向上下文中添加引用的表格和主要上下文（例如“您是一个基于SQL的聊天...”）。为了让聊天能够记住之前的对话，我试图在Java中保存历史记录并再次传递上下文历史（但这很快就会超过令牌限制）。

This is what I'm currently doing (no remembering from the AI):

这是我现在的做法（AI不会记住之前的对话）

chatMessages.add(new ChatMessage(ChatRole.SYSTEM, context));

chatMessages.add(new ChatMessage(ChatRole.USER, question));

ChatCompletions chatCompletions = client.getChatCompletions(deploymentOrModelId, new ChatCompletionsOptions(chatMessages));

问题解决：

As far as I know, there is no way to make the LLM (Azure OpenAI in this case) remember your context cheaply, as you said, sending context (and a huge chunk of it) on each call gets pricy really fast. That been said, you could change the approach and try other techniques to mimic that the AI has memory like summarizing the previous questions and send that as content (instead of a long string with 20 questions/answers, you send a short summary of what the user has been asking for. it will keep your prompt short and kind of "aware" of the conversation.

据我所知，确实没有便宜的方法让大型语言模型（在这种情况下是Azure OpenAI）记住上下文，正如您所说，每次调用时发送上下文（特别是大量的上下文）会很快变得昂贵。话虽如此，您可以改变方法并尝试其他技术来模拟AI具有记忆的功能，比如总结之前的问题并将其作为内容发送（而不是发送包含20个问答的长字符串，您发送一个用户一直在询问的内容的简短摘要）。这将使您的提示保持简短，并使AI对对话保持“意识”。

There are also conversation buffers (keeping the chat history in memory and send it to de llm each time as you did) but it gets long pretty fast, for that you could configure a buffer window (limiting the memory of the conversation to the last 3 questions for example, that should help keep the token count manageable).

还有对话缓冲区（将聊天历史保存在内存中，并在每次调用时像您之前所做的那样发送给LLM），但对话历史很快就会变得很长。为此，您可以配置一个缓冲区窗口（例如，将对话的内存限制为最后3个问题），这有助于将令牌数量控制在可管理的范围内。

There are several ways to manage this but there is no "perfect memory" as far as I know, not one the is worth paying. If you could tell us a bit more on how good the bot memory needs to be or the specific use case, maybe we can be more precise. Good luck!

管理这种情况有几种方法，但据我所知，没有“完美的记忆”，至少没有一种值得为此付费的。如果您能告诉我们机器人需要多好的记忆能力，或者具体的使用场景，我们可能能给出更精确的建议。祝您好运！