目录
- Chat completions Beta 聊天交互
- 前言
- Introduction 导言
- Response format 提示格式
- Managing tokens
- Counting tokens for chat API calls 为聊天API调用标记计数
- Instructing chat models 指导聊天模型
- Chat vs Completions 聊天与完成
- FAQ 问与答
- 其它资料下载
Chat completions Beta 聊天交互
Using the OpenAI Chat API, you can build your own applications with gpt-3.5-turbo
and gpt-4
to do things like:
使用OpenAI Chat API,您可以使用 gpt-3.5-turbo
和gpt-4
构建自己的应用程序,以执行以下操作:
- Draft an email or other piece of writing
起草一封电子邮件或其他书面材料 - Write Python code 编写Python代码
- Answer questions about a set of documents
回答有关一组文档的问题 - Create conversational agents 创建会话代理
- Give your software a natural language interface
给予你的软件一个自然语言界面 - Tutor in a range of subjects
一系列科目的导师 - Translate languages 翻译语言
- Simulate characters for video games and much more
模拟视频游戏的角色等等
This guide explains how to make an API call for chat-based language models and shares tips for getting good results. You can also experiment with the new chat format in the OpenAI Playground.
本指南解释了如何为基于聊天的语言模型进行API调用,并分享了获得良好结果的提示。您还可以在OpenAI Playground中尝试新的聊天格式。
前言
ChatGPT的聊天交互是与用户进行交流并为他们提供创新体验的最强大工具。一句话可以概况:“将聊天交互发挥到极致,定制专属用户体验,以此解决用户痛点,是最佳创新之道。”它可以帮助用户快速定位问题并获得正确的答案,提升用户体验的同时,也能提高工作效率、减少耗时。此外,ChatGPT还可以根据用户特征和需求,提供个性化的服务,让使用者在交互中感受到独一无二的体验。
Introduction 导言
Chat models take a series of messages as input, and return a model-generated message as output.
聊天模型将一系列消息作为输入,并返回模型生成的消息作为输出。
Although the chat format is designed to make multi-turn conversations easy, it’s just as useful for single-turn tasks without any conversations (such as those previously served by instruction following models like text-davinci-003
).
虽然聊天格式旨在使多轮对话变得容易,但它对于没有任何对话的单轮任务同样有用(例如以前由指令跟随模型(如 text-davinci-003
)提供的任务)。
An example API call looks as follows:
示例API调用如下所示:
# Note: you need to be using OpenAI Python v0.27.0 for the code below to work
# 注意:下面的代码需要使用OpenAI Python v0.27.0才能工作
import openai
openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
{"role": "user", "content": "Where was it played?"}
]
)
The main input is the messages parameter. Messages must be an array of message objects, where each object has a role (either “system”, “user”, or “assistant”) and content (the content of the message). Conversations can be as short as 1 message or fill many pages.
主输入是messages参数。消息必须是消息对象的数组,其中每个对象都有一个角色(“系统”、“用户”或“助理”)和内容(消息的内容)。对话可以短至1条消息或填写许多页面。
Typically, a conversation is formatted with a system message first, followed by alternating user and assistant messages.
通常,首先使用系统消息格式化对话,然后交替使用用户和助理消息。
The system message helps set the behavior of the assistant. In the example above, the assistant was instructed with “You are a helpful assistant.”
系统消息有助于设置助手的行为。在上面的示例中,助理被指示“你是一个有用的助理。”
gpt-3.5-turbo-0301 does not always pay strong attention to system messages.Future models will be trained to pay stronger attention to system messages.
GPT-3.5-turbo-0301并不总是特别关注系统消息。未来的模型将被训练为更加关注系统消息。
The user messages help instruct the assistant. They can be generated by the end users of an application, or set by a developer as an instruction.
用户消息帮助指示助理。它们可以由应用程序的最终用户生成,也可以由开发人员设置为指令。
The assistant messages help store prior responses. They can also be written by a developer to help give examples of desired behavior.
助理消息帮助存储先前的响应。它们也可以由开发人员编写,以帮助给予所需行为的示例。
Including the conversation history helps when user instructions refer to prior messages. In the example above, the user’s final question of “Where was it played?” only makes sense in the context of the prior messages about the World Series of 2020. Because the models have no memory of past requests, all relevant information must be supplied via the conversation. If a conversation cannot fit within the model’s token limit, it will need to be shortened in some way.
包括对话历史有助于用户指示引用先前的消息。在上面的示例中,用户的最终问题“在哪里播放的?“只有在之前关于2020年世界大赛的信息中才有意义。因为模型没有过去请求的记忆,所有相关信息必须通过会话提供。如果一个对话不能满足模型的标记限制,它将需要以某种方式缩短。
Response format 提示格式
An example API response looks as follows:
API响应示例如下所示:
{
'id': 'chatcmpl-6p9XYPYSTTRi0xEviKjjilqrWU2Ve',
'object': 'chat.completion',
'created': 1677649420,
'model': 'gpt-3.5-turbo',
'usage': {'prompt_tokens': 56, 'completion_tokens': 31, 'total_tokens': 87},
'choices': [
{
'message': {
'role': 'assistant',
'content': 'The 2020 World Series was played in Arlington, Texas at the Globe Life Field, which was the new home stadium for the Texas Rangers.'},
'finish_reason': 'stop',
'index': 0
}
]
}
In Python, the assistant’s reply can be extracted with response['choices'][0]['message']['content']
.
在Python中,助手的回复可以用 response['choices'][0]['message']['content']
提取。
Every response will include a finish_reason
. The possible values for finish_reason
are:
每个响应都将包含一个 finish_reason
。 finish_reason
的可能值为:
stop
: API returned complete model output
stop
:API返回完整模型输出length
: Incomplete model output due to max_tokens parameter or token limit
length
:由于 max_tokens 参数或标记限制,模型输出不完整content_filter
: Omitted content due to a flag from our content filters
content_filter
:由于我们的内容过滤器中的标记而忽略了内容null
: API response still in progress or incomplete
null
:API响应仍在进行中或未完成
Managing tokens
Language models read text in chunks called tokens. In English, a token can be as short as one character or as long as one word (e.g., a
orapple
), and in some languages tokens can be even shorter than one character or even longer than one word.
语言模型以称为标记的块读取文本。在英语中,标记可以短至一个字符或长至一个单词(例如, a
或 apple
),并且在一些语言中,标记甚至可以比一个字符更短或者甚至比一个单词更长。
For example, the string "ChatGPT is great!"
is encoded into six tokens: ["Chat", "G", "PT", " is", " great", "!"]
.
例如,字符串 "ChatGPT is great!"
被编码为六个标记: ["Chat", "G", "PT", " is", " great", "!"]
。
The total number of tokens in an API call affects:
API调用中的标记总数影响:
- How much your API call costs, as you pay per token
您的API调用成本是多少,按每个标记标记支付 - How long your API call takes, as writing more tokens takes more time
您的API调用需要多长时间,因为编写更多标记需要更多时间 - Whether your API call works at all, as total tokens must be below the model’s maximum limit (4096 tokens for
gpt-3.5-turbo-0301
)
您的API调用是否有效,会受到标记总数必须低于模型的最大限制(gpt-3.5-turbo-0301
为4096个标记)
Both input and output tokens count toward these quantities. For example, if your API call used 10 tokens in the message input and you received 20 tokens in the message output, you would be billed for 30 tokens.
输入和输出标记都计入这些数量。例如,如果您的API调用在消息输入中使用了10个标记,而您在消息输出中收到了20个标记,则您将收取30个标记的费用。
To see how many tokens are used by an API call, check the usage
field in the API response (e.g., response['usage']['total_tokens']
).
要查看API调用使用了多少标记,请检查API响应中的 usage
字段(例如,response['usage']['total_tokens']
)。
Chat models like gpt-3.5-turbo
and gpt-4
use tokens in the same way as other models, but because of their message-based formatting, it’s more difficult to count how many tokens will be used by a conversation.
像 gpt-3.5-turbo
和 gpt-4
这样的聊天模型使用标记的方式与其他模型相同,但由于它们基于消息的格式,因此更难以计算会话将使用多少标记。
Counting tokens for chat API calls 为聊天API调用标记计数
Below is an example function for counting tokens for messages passed to gpt-3.5-turbo-0301.
下面是一个示例函数,用于对传递到gpt-3.5-turbo-0301的消息的标记进行计数。
The exact way that messages are converted into tokens may change from model to model. So when future model versions are released, the answers returned by this function may be only approximate. The ChatML documentation explains how messages are converted into tokens by the OpenAI API, and may be useful for writing your own function.
将消息转换为标记的确切方式可能因模型而异。因此,当未来的模型版本发布时,此函数返回的答案可能只是近似值。ChatML文档解释了OpenAI API如何将消息转换为标记,并且可能对编写您自己的函数很有用。
def num_tokens_from_messages(messages, model="gpt-3.5-turbo-0301"):
"""Returns the number of tokens used by a list of messages."""
try:
encoding = tiktoken.encoding_for_model(model)
except KeyError:
encoding = tiktoken.get_encoding("cl100k_base")
if model == "gpt-3.5-turbo-0301": # note: future models may deviate from this
num_tokens = 0
for message in messages:
num_tokens += 4 # every message follows <im_start>{role/name}\n{content}<im_end>\n
for key, value in message.items():
num_tokens += len(encoding.encode(value))
if key == "name": # if there's a name, the role is omitted
num_tokens += -1 # role is always required and always 1 token
num_tokens += 2 # every reply is primed with <im_start>assistant
return num_tokens
else:
raise NotImplementedError(f"""num_tokens_from_messages() is not presently implemented for model {model}.
See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens.""")
Next, create a message and pass it to the function defined above to see the token count, this should match the value returned by the API usage parameter:
接下来,创建一条消息并将其传递给上面定义的函数以查看标记计数,这应该与API使用参数返回的值匹配:
messages = [
{"role": "system", "content": "You are a helpful, pattern-following assistant that translates corporate jargon into plain English."},
{"role": "system", "name":"example_user", "content": "New synergies will help drive top-line growth."},
{"role": "system", "name": "example_assistant", "content": "Things working well together will increase revenue."},
{"role": "system", "name":"example_user", "content": "Let's circle back when we have more bandwidth to touch base on opportunities for increased leverage."},
{"role": "system", "name": "example_assistant", "content": "Let's talk later when we're less busy about how to do better."},
{"role": "user", "content": "This late pivot means we don't have time to boil the ocean for the client deliverable."},
]
model = "gpt-3.5-turbo-0301"
print(f"{num_tokens_from_messages(messages, model)} prompt tokens counted.")
# Should show ~126 total_tokens
To confirm the number generated by our function above is the same as what the API returns, create a new Chat Completion:
要确认我们上面的函数生成的数字与API返回的数字相同,请创建一个新的聊天完成:
# example token count from the OpenAI API
import openai
response = openai.ChatCompletion.create(
model=model,
messages=messages,
temperature=0,
)
print(f'{response["usage"]["prompt_tokens"]} prompt tokens used.')
To see how many tokens are in a text string without making an API call, use OpenAI’s tiktoken Python library. Example code can be found in the OpenAI Cookbook’s guide on how to count tokens with tiktoken.
要在不进行API调用的情况下查看文本字符串中有多少标记,请使用OpenAI的tiktoken Python库。示例代码可以在OpenAI Cookbook关于如何使用tiktoken进行标记计数的指南中找到。
Each message passed to the API consumes the number of tokens in the content, role, and other fields, plus a few extra for behind-the-scenes formatting. This may change slightly in the future.
传递给API的每条消息都会消耗内容、角色和其他字段中的标记数量,外加一些用于幕后格式化的额外标记。这在未来可能会略有变化。
If a conversation has too many tokens to fit within a model’s maximum limit (e.g., more than 4096 tokens for gpt-3.5-turbo
), you will have to truncate, omit, or otherwise shrink your text until it fits. Beware that if a message is removed from the messages input, the model will lose all knowledge of it.
如果对话具有太多标记而不能适应模型的最大限制(例如,超过4096个标记(对于 gpt-3.5-turbo
),您将不得不截断、省略或以其他方式缩小文本,直到它适合为止。请注意,如果从消息输入中删除了一条消息,则模型将丢失所有关于它的知识。
Note too that very long conversations are more likely to receive incomplete replies. For example, a gpt-3.5-turbo
conversation that is 4090 tokens long will have its reply cut off after just 6 tokens.
也要注意,很长的对话更有可能收到不完整的回复。例如,长度为4090个标记的 gpt-3.5-turbo
会话将在仅6个标记之后切断其回复。
Instructing chat models 指导聊天模型
Best practices for instructing models may change from model version to version. The advice that follows applies to gpt-3.5-turbo-0301
and may not apply to future models.
指导模型的最佳实践可能因模型版本而异。以下建议适用于gpt-3. 5-turbo-0301
,可能不适用于未来的模型。
Many conversations begin with a system message to gently instruct the assistant. For example, here is one of the system messages used for ChatGPT:
许多对话以系统消息开始,以温和地指示助理。例如,以下是用于ChatGPT的系统消息之一:
You are ChatGPT, a large language model trained by OpenAI. Answer as concisely as possible. Knowledge cutoff: {knowledge_cutoff} Current date: {current_date}
你是ChatGPT,一个由OpenAI训练的大型语言模型。尽可能简明扼要地回答。知识截止:{knowledge_cutoff} 当前日期: {current_date}
In general, gpt-3.5-turbo-0301
does not pay strong attention to the system message, and therefore important instructions are often better placed in a user message.
一般来说, gpt-3.5-turbo-0301
不太关注系统消息,因此重要的指令通常最好放在用户消息中。
If the model isn’t generating the output you want, feel free to iterate and experiment with potential improvements. You can try approaches like:
如果模型没有生成您想要的输出,请随意迭代并尝试潜在的改进。您可以尝试以下方法:
- Make your instruction more explicit
让你的指示更明确 - Specify the format you want the answer in
指定您希望答案采用的格式 - Ask the model to think step by step or debate pros and cons before settling on an answer
让模型一步一步地思考,或者在确定答案之前讨论利弊
For more prompt engineering ideas, read the OpenAI Cookbook guide on techniques to improve reliability.
有关更多快速工程想法,请阅读OpenAI Cookbook关于提高可靠性的技术指南。
Beyond the system message, the temperature
and max tokens
are two of many options developers have to influence the output of the chat models. For temperature
, higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. In the case of max tokens
, if you want to limit a response to a certain length, max tokens can be set to an arbitrary number. This may cause issues for example if you set the max tokens value to 5 since the output will be cut-off and the result will not make sense to users.
除了系统消息之外, temperature
和max tokens
是开发人员必须影响聊天模型输出的许多选项中的两个。对于temperature
,较高的值(如0.8)将使输出更加随机,而较低的值(如0.2)将使其更加集中和确定。在max tokens
的情况下,如果您希望将响应限制为特定长度,则可以将max tokens
设置为任意数字。这可能会导致问题,例如,如果您将最大标记值设置为5,因为输出将被切断,结果对用户没有意义。
Chat vs Completions 聊天与完成
Because gpt-3.5-turbo
performs at a similar capability to text-davinci-003 but at 10% the price per token, we recommend gpt-3.5-turbo
for most use cases.
由于 gpt-3.5-turbo
的性能与 text-davinci-003
相似,但每个标记的价格为10%,因此我们建议在大多数用例中使用 gpt-3.5-turbo
。
For many developers, the transition is as simple as rewriting and retesting a prompt.
对于许多开发人员来说,转换就像重写和重新测试提示一样简单。
For example, if you translated English to French with the following completions prompt:
例如,如果您将英语翻译为法语,并出现以下完成提示:
Translate the following English text to French: "{text}"
An equivalent chat conversation could look like:
类似的聊天对话可能如下所示:
[
{"role": "system", "content": "You are a helpful assistant that translates English to French."},
{"role": "user", "content": 'Translate the following English text to French: "{text}"'}
]
Or even just the user message:
或者只是用户消息:
[
{"role": "user", "content": 'Translate the following English text to French: "{text}"'}
]
FAQ 问与答
Is fine-tuning available for gpt-3.5-turbo
?
gpt-3.5-turbo
是否可进行微调?
No. As of Mar 1, 2023, you can only fine-tune base GPT-3 models. See the fine-tuning guide for more details on how to use fine-tuned models.
不可以。自2023年3月1日起,您只能微调基本GPT-3模型。有关如何使用微调模型的更多详细信息,请参阅微调指南。
Do you store the data that is passed into the API?
您是否存储传递到API中的数据?
As of March 1st, 2023, we retain your API data for 30 days but no longer use your data sent via the API to improve our models. Learn more in our data usage policy.
自2023年3月1日起,我们将保留您的API数据30天,但不再使用您通过API发送的数据来改进我们的模型。在我们的数据使用政策中了解更多信息。
Adding a moderation layer 添加审核层
If you want to add a moderation layer to the outputs of the Chat API, you can follow our moderation guide to prevent content that violates OpenAI’s usage policies from being shown.
如果您想在Chat API的输出中添加审核层,您可以遵循我们的审核指南,以防止显示违反OpenAI使用策略的内容。
其它资料下载
如果大家想继续了解人工智能相关学习路线和知识体系,欢迎大家翻阅我的另外一篇博客《重磅 | 完备的人工智能AI 学习——基础知识学习路线,所有资料免关注免套路直接网盘下载》
这篇博客参考了Github知名开源平台,AI技术平台以及相关领域专家:Datawhale,ApacheCN,AI有道和黄海广博士等约有近100G相关资料,希望能帮助到所有小伙伴们。