微调ChatGPT模型

前言
Introduction 导言
What models can be fine-tuned? 哪些模型可以微调？
Installation 安装
Prepare training data 准备训练数据
- CLI data preparation tool CLI数据准备工具
- Create a fine-tuned model 创建微调模型
- Use a fine-tuned model 使用微调模型
- Delete a fine-tuned model 删除微调模型
Preparing your dataset 准备数据集
- Data formatting 数据标准化
- General best practices 一般最佳做法
- Specific guidelines 具体准则
- - Classification 分类
  - Case study: Is the model making untrue statements? 案例学习：模型是否在做不真实的陈述？
  - Case study: Sentiment analysis 案例学习:情感分析
  - Case study: Categorization for Email triage 案例学习：电子邮件分类
- Conditional generation 条件生成
- - Case study: Write an engaging ad based on a Wikipedia article 案例学习：根据维基百科的一篇文章写一则吸引人的广告
  - Case study: Entity extraction 案例研究：实体提取
  - Case study: Customer support chatbot 案例研究：客户支持聊天机器人
  - Case study: Product description based on a technical list of properties 案例研究：基于特性技术列表的产品描述
Advanced usage 进阶应用
- Customize your model name 自定义您的模型名称
- Analyzing your fine-tuned model 分析微调的模型
- Classification specific metrics 分类特定指标
- - For multiclass classification 对于多类分类
  - For binary classification 对于二元分类
- Validation 验证
- Hyperparameters 超参数
- Continue fine-tuning from a fine-tuned model 从微调模型继续微调
Weights & Biases 权重和偏差
Example notebooks notebooks格式示例
- Classification 分类
- Question answering 问答
其它资料下载

在这里插入图片描述

Fine-tuning 微调模型
Learn how to customize a model for your application.
了解如何为应用程序自定义模型。

前言

ChatGPT可以帮助用户使用自己的语料集来训练出一个更加适用于用户使用场景的准确、可靠的自然语言模型。

ChatGPT的微调模型技术主要通过将先前预训练过的语言模型（如GPT-3.5）作为架构，结合使用者特定的语料库来重新训练模型，从而提高模型的效果。

经过微调模型，ChatGPT可以根据用户特定场景的意图和任务生成的预料库，重新微调训练模型的参数，使模型能够更好地响应特定场景下的对话。最终，帮助用户优化模型，使其在实时聊天环境中发挥最佳效果。

Introduction 导言

Fine-tuning lets you get more out of the models available through the API by providing:
微调通过提供以下功能，让您能够更好地利用API提供的模型：

Higher quality results than prompt design
比提示设计更高质量的结果
Ability to train on more examples than can fit in a prompt
能够训练超过提示符所能容纳的更多示例
Token savings due to shorter prompts
由于提示更短而节省标记
Lower latency requests 请求延迟更低

GPT-3 has been pre-trained on a vast amount of text from the open internet. When given a prompt with just a few examples, it can often intuit what task you are trying to perform and generate a plausible completion. This is often called “few-shot learning.”
GPT-3已经针对来自开放互联网的大量文本进行了预训练。当给出一个只有几个例子的提示时，它通常可以凭直觉知道你要做什么，并产生一个看似合理的完成。这通常被称为“少样本学习”。

Fine-tuning improves on few-shot learning by training on many more examples than can fit in the prompt, letting you achieve better results on a wide number of tasks. Once a model has been fine-tuned, you won’t need to provide examples in the prompt anymore. This saves costs and enables lower-latency requests.
微调通过在提示中无法容纳的更多示例上进行训练来改进少样本学习，从而使您能够在大量任务上获得更好的结果。一旦一个模型被微调，您就不再需要在提示中提供示例了。这可以节省成本并实现较低延迟的请求。

At a high level, fine-tuning involves the following steps:
在高级别上，微调包括以下步骤：

Prepare and upload training data 准备和上传培训数据
Train a new fine-tuned model 训练新的微调模型
Use your fine-tuned model 使用您的微调模型

Visit our pricing page to learn more about how fine-tuned model training and usage are billed.
请访问我们的定价页面，详细了解如何对经过微调的模型训练和使用计费。

What models can be fine-tuned? 哪些模型可以微调？

Fine-tuning is currently only available for the following base models: davinci, curie, babbage, and ada. These are the original models that do not have any instruction following training (like text-davinci-003 does for example). You are also able to continue fine-tuning a fine-tuned model to add additional data without having to start from scratch.
微调功能目前仅适用于以下基本模型： davinci, curie, babbage, 和ada。这些是原始模型，在接下来培训后没有任何说明（例如 text-davinci-003 ）。您还可以继续微调已微调的模型以添加其他数据，而不必从头开始。

Installation 安装

We recommend using our OpenAI command-line interface (CLI). To install this, run
我们建议使用OpenAI命令行界面（CLI）。要安装此程序，请运行

pip install --upgrade openai

(The following instructions work for version 0.9.4 and up. Additionally, the OpenAI CLI requires python 3.)
(以下说明适用于0.9.4及更高版本。此外，OpenAI CLI需要python 3。）

Set your OPENAI_API_KEY environment variable by adding the following line into your shell initialization script (e.g. .bashrc, zshrc, etc.) or running it in the command line before the fine-tuning command:
通过在shell初始化脚本（例如.bashrc, zshrc等）中添加以下行来设置 OPENAI_API_KEY 环境变量。或者在微调命令之前在命令行中运行它：

export OPENAI_API_KEY="<OPENAI_API_KEY>"

Prepare training data 准备训练数据

Training data is how you teach GPT-3 what you’d like it to say.
训练数据是您如何教GPT-3您希望它说什么。

Your data must be a JSONL document, where each line is a prompt-completion pair corresponding to a training example. You can use our CLI data preparation tool to easily convert your data into this file format.
您的数据必须是JSONL文档，其中每一行都是对应于一个训练示例的提示-完成对。您可以使用我们的CLI数据准备工具轻松地将数据转换为这种文件格式。

{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
...

CLI data preparation tool CLI数据准备工具

We developed a tool which validates, gives suggestions and reformats your data:
我们开发了一个工具，用于验证、提供建议和重新格式化您的数据：

openai tools fine_tunes.prepare_data -f <LOCAL_FILE>

This tool accepts different formats, with the only requirement that they contain a prompt and a completion column/key. You can pass a CSV, TSV, XLSX, JSON or JSONL file, and it will save the output into a JSONL file ready for fine-tuning, after guiding you through the process of suggested changes.
此工具接受不同的格式，唯一的要求是它们包含提示和完成列/键。您可以传递CSV、TSV、XLSX、JSON或JSONL文件，在指导您完成建议的更改过程后，它会将输出保存到JSONL文件中，以便进行微调。

Create a fine-tuned model 创建微调模型

The following assumes you’ve already prepared training data following the above instructions.
以下假设您已经按照上述说明准备了训练数据。

Start your fine-tuning job using the OpenAI CLI:
使用OpenAI CLI启动微调作业：

openai api fine_tunes.create -t <TRAIN_FILE_ID_OR_PATH> -m <BASE_MODEL>

Where BASE_MODEL is the name of the base model you’re starting from (ada, babbage, curie, or davinci). You can customize your fine-tuned model’s name using the suffix parameter.
其中， BASE_MODEL 是您开始使用的基础模型的名称（ada、babbage、curie或davinci）。可使用后缀参数定制微调模型的名称。

Running the above command does several things:
运行上述命令可执行以下操作：

Uploads the file using the files API (or uses an already-uploaded file)
使用files API上传文件（或使用已上传的文件）
Creates a fine-tune job 创建微调作业
Streams events until the job is done (this often takes minutes, but can take hours if there are many jobs in the queue or your dataset is large)
流式处理事件，直到作业完成（这通常需要几分钟，但如果队列中有许多作业或数据集很大，则可能需要几个小时）

Every fine-tuning job starts from a base model, which defaults to curie. The choice of model influences both the performance of the model and the cost of running your fine-tuned model. Your model can be one of: ada, babbage, curie, or davinci. Visit our pricing page for details on fine-tune rates.
每一个微调工作都是从一个基本模型开始的，它默认为curie。模型的选择会影响模型的性能和运行微调模型的成本。您的模型可以是以下之一： ada, babbage, curie, 和davinci。请访问我们的定价页面，了解微调价格的详细信息。

After you’ve started a fine-tune job, it may take some time to complete. Your job may be queued behind other jobs on our system, and training our model can take minutes or hours depending on the model and dataset size. If the event stream is interrupted for any reason, you can resume it by running:
启动微调作业后，可能需要一些时间才能完成。您的作业可能排在系统中其他作业的后面，训练模型可能需要几分钟或几小时，具体取决于模型和数据集大小。如果事件流因任何原因中断，您可以通过运行以下命令来恢复它：

openai api fine_tunes.follow -i <YOUR_FINE_TUNE_JOB_ID>

When the job is done, it should display the name of the fine-tuned model.
当工作完成时，它应该显示微调过的模型的名称。

In addition to creating a fine-tune job, you can also list existing jobs, retrieve the status of a job, or cancel a job.
除了创建微调作业外，还可以列出现有作业、检索作业状态或取消作业。

# List all created fine-tunes 列出所有创建的微调
openai api fine_tunes.list

# Retrieve the state of a fine-tune. The resulting object includes 检索微调的状态。结果对象包括
# job status (which can be one of pending, running, succeeded, or failed) 作业状态(可以是挂起、运行、成功或失败)
# and other information 以及其它的信息
openai api fine_tunes.get -i <YOUR_FINE_TUNE_JOB_ID>

# Cancel a job 取消工作
openai api fine_tunes.cancel -i <YOUR_FINE_TUNE_JOB_ID>

Use a fine-tuned model 使用微调模型

When a job has succeeded, the fine_tuned_model field will be populated with the name of the model. You may now specify this model as a parameter to our Completions API, and make requests to it using the Playground.
作业成功后， fine_tuned_model 字段将填充模型的名称。现在，您可以将此模型指定为API的参数，并使用Playground向其发出请求。

After your job first completes, it may take several minutes for your model to become ready to handle requests. If completion requests to your model time out, it is likely because your model is still being loaded. If this happens, try again in a few minutes.
作业首次完成后，模型可能需要几分钟时间才能准备好处理请求。如果对模型的完成请求超时，很可能是因为模型仍在加载中。如果发生这种情况，请在几分钟后重试。

You can start making requests by passing the model name as the model parameter of a completion request:
您可以通过将模型名称作为完成请求的 model 参数传递来开始发出请求：

OpenAI CLI:

openai api models.delete -i <FINE_TUNED_MODEL>

cURL:

curl -X "DELETE" https://api.openai.com/v1/models/<FINE_TUNED_MODEL> \
  -H "Authorization: Bearer $OPENAI_API_KEY"

Python:

import openai
openai.Model.delete(FINE_TUNED_MODEL)

Node.js:

const response = await openai.createCompletion({
  model: FINE_TUNED_MODEL
  prompt: YOUR_PROMPT,
});

You may continue to use all the other [Completions](https://platform.openai.com/docs/api-reference/completions) parameters like temperature, frequency_penalty, presence_penalty, etc, on these requests to fine-tuned models.
在这些请求中，您可以继续使用所有其他“[完成](https://platform.openai.com/docs/api-reference/completions)”参数，如temperature, frequency_penalty, presence_penalty等，以微调模型。

Delete a fine-tuned model 删除微调模型

To delete a fine-tuned model, you must be designated an “owner” within your organization.
要删除微调模型，您必须在组织内被指定为“所有者”。

OpenAI CLI:

openai api completions.create -m <FINE_TUNED_MODEL> -p <YOUR_PROMPT>

cURL:

curl https://api.openai.com/v1/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt": YOUR_PROMPT, "model": FINE_TUNED_MODEL}'

Python:

import openai
openai.Completion.create(
    model=FINE_TUNED_MODEL,
    prompt=YOUR_PROMPT)

Node.js:

const response = await openai.createCompletion({
  model: FINE_TUNED_MODEL
  prompt: YOUR_PROMPT,
});

Preparing your dataset 准备数据集

Fine-tuning is a powerful technique to create a new model that’s specific to your use case. Before fine-tuning your model, we strongly recommend reading these best practices and specific guidelines for your use case below.
微调是一种强大的技术，可以创建特定于您的用例的新模型。在微调您的模型之前，我们强烈建议您阅读下面针对您的用例的这些最佳实践和特定指南。

Data formatting 数据标准化

To fine-tune a model, you’ll need a set of training examples that each consist of a single input (“prompt”) and its associated output (“completion”). This is notably different from using our base models, where you might input detailed instructions or multiple examples in a single prompt.
要对模型进行微调，您需要一组训练示例，每个示例都包含一个输入（“prompt”）及其关联的输出（“completion”）。这与使用我们的基础模型有着显著的不同，在基础模型中，您可以在一个提示符中输入详细的说明或多个示例。

Each prompt should end with a fixed separator to inform the model when the prompt ends and the completion begins. A simple separator which generally works well is \n\n###\n\n. The separator should not appear elsewhere in any prompt.
每个提示都应该以固定的分隔符结束，以便在提示结束和完成开始时通知模型。一个简单的分隔符通常是 \n\n###\n\n 。分隔符不应出现在任何提示中的其他位置。
Each completion should start with a whitespace due to our tokenization, which tokenizes most words with a preceding whitespace.
根据标记化，每个完成都应该以一个空格开始，它用前面的空格标记大多数单词。
Each completion should end with a fixed stop sequence to inform the model when the completion ends. A stop sequence could be \n, ###, or any other token that does not appear in any completion.
每次完成都应该以固定的停止顺序结束，以通知模型完成何时结束。停止序列可以是 \n 、 ### 或不出现在任何完成中的任何其它标记。
For inference, you should format your prompts in the same way as you did when creating the training dataset, including the same separator. Also specify the same stop sequence to properly truncate the completion.
为了进行推断，您应该按照创建训练数据集时的方式设置提示的格式，包括相同的分隔符。还要指定相同的停止序列以正确截断完成。

General best practices 一般最佳做法

Fine-tuning performs better with more high-quality examples. To fine-tune a model that performs better than using a high-quality prompt with our base models, you should provide at least a few hundred high-quality examples, ideally vetted by human experts. From there, performance tends to linearly increase with every doubling of the number of examples. Increasing the number of examples is usually the best and most reliable way of improving performance.
使用更多高质量示例时，微调性能更好。要微调模型，使其比在基本模型中使用高质量提示符表现得更好，您应该提供至少几百个高质量的示例，最好是经过人类专家的审查。从那时起，性能往往随着示例数量的每一次翻倍而线性增加。增加示例的数量通常是提高性能的最佳和最可靠的方法。

Classifiers are the easiest models to get started with. For classification problems we suggest using ada, which generally tends to perform only very slightly worse than more capable models once fine-tuned, whilst being significantly faster and cheaper.
分类器是最容易上手的模型。对于分类问题，我们建议使用ada，它通常在微调后的性能只比更强大的模型稍差，同时速度更快，成本更低。

If you are fine-tuning on a pre-existing dataset rather than writing prompts from scratch, be sure to manually review your data for offensive or inaccurate content if possible, or review as many random samples of the dataset as possible if it is large.
如果您要对预先存在的数据集进行微调，而不是从头开始编写提示，请确保尽可能手动检查数据中是否有冒犯性或不准确的内容，或者如果数据集较大，请检查尽可能多的数据集随机样本。

Specific guidelines 具体准则

Fine-tuning can solve a variety of problems, and the optimal way to use it may depend on your specific use case. Below, we’ve listed the most common use cases for fine-tuning and corresponding guidelines.
微调可以解决各种问题，使用微调的最佳方式可能取决于您的特定用例。下面，我们列出了最常见的微调用例和相应的指导原则。

Classification 分类

In classification problems, each input in the prompt should be classified into one of the predefined classes. For this type of problem, we recommend:
在分类问题中，提示中的每个输入都应该被分类到预定义的类中。对于此类问题，我们建议：

Use a separator at the end of the prompt, e.g. \n\n###\n\n. Remember to also append this separator when you eventually make requests to your model.
在提示符的末尾使用分隔符，例如 \n\n###\n\n 。记住，当您最终向模型发出请求时，也要附加此分隔符。
Choose classes that map to a single token. At inference time, specify max_tokens=1 since you only need the first token for classification.
选择映射到单个标记的类。在推理时，指定 max_tokens=1 ，因为您只需要第一个标记进行分类。
Ensure that the prompt + completion doesn’t exceed 2048 tokens, including the separator
确保提示+完成不超过2048个标记，包括分隔符
Aim for at least ~100 examples per class
目标是每堂课至少有~100个示例
To get class log probabilities you can specify logprobs=5 (for 5 classes) when using your model
要获取类对数概率，可在使用模型时指定 logprobs=5 （针对5个类）
Ensure that the dataset used for finetuning is very similar in structure and type of task as what the model will be used for
确保用于微调的数据集在结构和任务类型上与模型将用于的任务非常相似

Case study: Is the model making untrue statements? 案例学习：模型是否在做不真实的陈述？

Let’s say you’d like to ensure that the text of the ads on your website mention the correct product and company. In other words, you want to ensure the model isn’t making things up. You may want to fine-tune a classifier which filters out incorrect ads.
假设您希望确保网站上的广告文本中提到正确的产品和公司。换句话说，你要确保模型没有捏造事实。您可能需要微调过滤不正确广告的分类器。

The dataset might look something like the following:
数据集可能如下所示：

{"prompt":"Company: BHFF insurance\nProduct: allround insurance\nAd:One stop shop for all your insurance needs!\nSupported:", "completion":" yes"}
{"prompt":"Company: Loft conversion specialists\nProduct: -\nAd:Straight teeth in weeks!\nSupported:", "completion":" no"}

In the example above, we used a structured input containing the name of the company, the product, and the associated ad. As a separator we used \nSupported: which clearly separated the prompt from the completion. With a sufficient number of examples, the separator doesn’t make much of a difference (usually less than 0.4%) as long as it doesn’t appear within the prompt or the completion.
在上面的例子中，我们使用了一个包含公司名称、产品和相关广告的结构化输入。作为分隔符，我们使用了 \nSupported: ，它将提示与完成清楚地分隔开。有了足够多的例子，只要分隔符不出现在提示符或完成符中，分隔符就不会造成太大的差异（通常小于0.4%）。

For this use case we fine-tuned an ada model since it will be faster and cheaper, and the performance will be comparable to larger models because it is a classification task.
对于这个用例，我们微调了一个ada模型，因为它会更快更便宜，而且性能会与更大的模型相当，因为它是一个分类任务。

Now we can query our model by making a Completion request.
现在我们可以通过发出Completion请求来查询模型。

curl https://api.openai.com/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "prompt": "Company: Reliable accountants Ltd\nProduct: Personal Tax help\nAd:Best advice in town!\nSupported:",
    "max_tokens": 1,
    "model": "YOUR_FINE_TUNED_MODEL_NAME"
  }'

Which will return either yes or no.
它将返回 yes 或no 。

Case study: Sentiment analysis 案例学习:情感分析

Let’s say you’d like to get a degree to which a particular tweet is positive or negative. The dataset might look something like the following:
比方说，你想知道某条推文的正面或负面程度。数据集可能如下所示：

{"prompt":"Overjoyed with the new iPhone! ->", "completion":" positive"}
{"prompt":"@lakers disappoint for a third straight night https://t.co/38EFe43 ->", "completion":" negative"}

Once the model is fine-tuned, you can get back the log probabilities for the first completion token by setting logprobs=2 on the completion request. The higher the probability for positive class, the higher the relative sentiment.
一旦对模型进行了微调，就可以通过在完成请求上设置 logprobs=2 来获取第一个完成标记的对数概率。正类的概率越高，相对情绪越高。

Now we can query our model by making a Completion request.
现在我们可以通过发出Completion请求来查询模型。

curl https://api.openai.com/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "prompt": "https://t.co/f93xEd2 Excited to share my latest blog post! ->",
    "max_tokens": 1,
    "model": "YOUR_FINE_TUNED_MODEL_NAME"
  }'

Which will return: 它将返回：

{
  "id": "cmpl-COMPLETION_ID",
  "object": "text_completion",
  "created": 1589498378,
  "model": "YOUR_FINE_TUNED_MODEL_NAME",
  "choices": [
    {
      "logprobs": {
        "text_offset": [
          19
        ],
        "token_logprobs": [
          -0.03597255
        ],
        "tokens": [
          " positive"
        ],
        "top_logprobs": [
          {
            " negative": -4.9785037,
            " positive": -0.03597255
          }
        ]
      },

      "text": " positive",
      "index": 0,
      "finish_reason": "length"
    }
  ]
}

Case study: Categorization for Email triage 案例学习：电子邮件分类

Let’s say you’d like to categorize incoming email into one of a large number of predefined categories. For classification into a large number of categories, we recommend you convert those categories into numbers, which will work well up to ~500 categories. We’ve observed that adding a space before the number sometimes slightly helps the performance, due to tokenization. You may want to structure your training data as follows:
假设您想将传入的电子邮件分类到大量预定义类别中的一个。对于大量类别的分类，我们建议您将这些类别转换为数字，这将在多达500个类别的情况下正常工作。我们观察到，根据标记化，在数字前添加一个空格有时会稍微提高性能。您可能希望按如下方式构建训练数据：

{"prompt":"Subject: <email_subject>\nFrom:<customer_name>\nDate:<date>\nContent:<email_body>\n\n###\n\n", "completion":" <numerical_category>"}

For example: 举例

{"prompt":"Subject: Update my address\nFrom:Joe Doe\nTo:support@ourcompany.com\nDate:2021-06-03\nContent:Hi,\nI would like to update my billing address to match my delivery address.\n\nPlease let me know once done.\n\nThanks,\nJoe\n\n###\n\n", "completion":" 4"}

In the example above we used an incoming email capped at 2043 tokens as input. (This allows for a 4 token separator and a one token completion, summing up to 2048.) As a separator we used \n\n###\n\n and we removed any occurrence of ### within the email.
在上面的示例中，我们使用了一封上限为2043个令牌的传入电子邮件作为输入。(允许4个标记分隔符和1个标记完成，总计为2048。）我们使用 \n\n###\n\n 作为分隔符，并删除了电子邮件中出现的所有 ### 。

Conditional generation 条件生成

Conditional generation is a problem where the content needs to be generated given some kind of input. This includes paraphrasing, summarizing, entity extraction, product description writing given specifications, chatbots and many others. For this type of problem we recommend:
条件生成是一个需要在给定某种输入的情况下生成内容的问题。这包括释义，总结，实体提取，产品描述写作给定的规格，聊天机器人和许多其他的。对于此类问题，我们建议：

Use a separator at the end of the prompt, e.g. \n\n###\n\n. Remember to also append this separator when you eventually make requests to your model.
在提示符的末尾使用分隔符，例如 \n\n###\n\n 。记住，当您最终向模型发出请求时，也要附加此分隔符。
Use an ending token at the end of the completion, e.g. END
在完成结束时使用结束标记，例如 END
Remember to add the ending token as a stop sequence during inference, e.g. stop=[" END"]
记住在推理过程中添加结束标记作为停止序列，例如 stop=[" END"]
Aim for at least ~500 examples
目标是至少500个示例
Ensure that the prompt + completion doesn’t exceed 2048 tokens, including the separator
确保提示+完成不超过2048个标记，包括分隔符
Ensure the examples are of high quality and follow the same desired format
确保示例具有高质量并遵循相同的所需格式
Ensure that the dataset used for finetuning is very similar in structure and type of task as what the model will be used for
确保用于微调的数据集在结构和任务类型上与模型将用于的任务非常相似
Using Lower learning rate and only 1-2 epochs tends to work better for these use cases
对于这些使用情形，使用较低的学习速率和仅1-2个时期往往效果更好

Case study: Write an engaging ad based on a Wikipedia article 案例学习：根据维基百科的一篇文章写一则吸引人的广告

This is a generative use case so you would want to ensure that the samples you provide are of the highest quality, as the fine-tuned model will try to imitate the style (and mistakes) of the given examples. A good starting point is around 500 examples. A sample dataset might look like this:
这是一个生成式用例，因此您需要确保所提供的示例具有最高的质量，因为经过微调的模型将尝试模仿给定示例的风格（和错误）。一个好的起点是500个左右的例子。示例数据集可能如下所示：

{"prompt":"<Product Name>\n<Wikipedia description>\n\n###\n\n", "completion":" <engaging ad> END"}

For example: 举例

{"prompt":"Samsung Galaxy Feel\nThe Samsung Galaxy Feel is an Android smartphone developed by Samsung Electronics exclusively for the Japanese market. The phone was released in June 2017 and was sold by NTT Docomo. It runs on Android 7.0 (Nougat), has a 4.7 inch display, and a 3000 mAh battery.\nSoftware\nSamsung Galaxy Feel runs on Android 7.0 (Nougat), but can be later updated to Android 8.0 (Oreo).\nHardware\nSamsung Galaxy Feel has a 4.7 inch Super AMOLED HD display, 16 MP back facing and 5 MP front facing cameras. It has a 3000 mAh battery, a 1.6 GHz Octa-Core ARM Cortex-A53 CPU, and an ARM Mali-T830 MP1 700 MHz GPU. It comes with 32GB of internal storage, expandable to 256GB via microSD. Aside from its software and hardware specifications, Samsung also introduced a unique a hole in the phone's shell to accommodate the Japanese perceived penchant for personalizing their mobile phones. The Galaxy Feel's battery was also touted as a major selling point since the market favors handsets with longer battery life. The device is also waterproof and supports 1seg digital broadcasts using an antenna that is sold separately.\n\n###\n\n", "completion":"Looking for a smartphone that can do it all? Look no further than Samsung Galaxy Feel! With a slim and sleek design, our latest smartphone features high-quality picture and video capabilities, as well as an award winning battery life. END"}

Here we used a multi line separator, as Wikipedia articles contain multiple paragraphs and headings. We also used a simple end token, to ensure that the model knows when the completion should finish.
这里我们使用了多行分隔符，因为维基百科条目包含多个段落和标题。我们还使用了一个简单的结束标记，以确保模型知道何时完成。

Case study: Entity extraction 案例研究：实体提取

This is similar to a language transformation task. To improve the performance, it is best to either sort different extracted entities alphabetically or in the same order as they appear in the original text. This will help the model to keep track of all the entities which need to be generated in order. The dataset could look as follows:
这类似于语言转换任务。若要提高性能，最好按字母顺序或按提取的不同实体在原始文本中出现的顺序对它们进行排序。这将帮助模型跟踪需要按顺序生成的所有实体。数据集可能如下所示：

{"prompt":"<any text, for example news article>\n\n###\n\n", "completion":" <list of entities, separated by a newline> END"}

For example: 举例

{"prompt":"Portugal will be removed from the UK's green travel list from Tuesday, amid rising coronavirus cases and concern over a \"Nepal mutation of the so-called Indian variant\". It will join the amber list, meaning holidaymakers should not visit and returnees must isolate for 10 days...\n\n###\n\n", "completion":" Portugal\nUK\nNepal mutation\nIndian variant END"}

A multi-line separator works best, as the text will likely contain multiple lines. Ideally there will be a high diversity of the types of input prompts (news articles, Wikipedia pages, tweets, legal documents), which reflect the likely texts which will be encountered when extracting entities.
多行分隔符效果最好，因为文本可能包含多行。理想地，将存在输入提示类型的高度多样性（新闻文章、维基百科页面、推特、法律的文档），其反映了在提取实体时将遇到的可能文本。

Case study: Customer support chatbot 案例研究：客户支持聊天机器人

A chatbot will normally contain relevant context about the conversation (order details), summary of the conversation so far as well as most recent messages. For this use case the same past conversation can generate multiple rows in the dataset, each time with a slightly different context, for every agent generation as a completion. This use case will require a few thousand examples, as it will likely deal with different types of requests, and customer issues. To ensure the performance is of high quality we recommend vetting the conversation samples to ensure the quality of agent messages. The summary can be generated with a separate text transformation fine tuned model. The dataset could look as follows:
聊天机器人通常包含对话的相关上下文（订单详细信息）、迄今为止的对话摘要以及最近的消息。对于此用例，相同的过去会话可以在数据集中生成多行，每次生成代理时上下文略有不同。这个用例将需要几千个示例，因为它可能会处理不同类型的请求和客户问题。为了确保高质量的性能，我们建议检查会话样本以确保代理消息的质量。摘要可以使用单独的文本转换微调模型生成。数据集可能如下所示：

{"prompt":"Summary: <summary of the interaction so far>\n\nSpecific information:<for example order details in natural language>\n\n###\n\nCustomer: <message1>\nAgent: <response1>\nCustomer: <message2>\nAgent:", "completion":" <response2>\n"}
{"prompt":"Summary: <summary of the interaction so far>\n\nSpecific information:<for example order details in natural language>\n\n###\n\nCustomer: <message1>\nAgent: <response1>\nCustomer: <message2>\nAgent: <response2>\nCustomer: <message3>\nAgent:", "completion":" <response3>\n"}

Here we purposefully separated different types of input information, but maintained Customer Agent dialog in the same format between a prompt and a completion. All the completions should only be by the agent, and we can use \n as a stop sequence when doing inference.
这里我们有目的地分离不同类型的输入信息，但在提示和完成之间以相同的格式维护Customer Agent对话框。所有的补全都应该只由代理完成，我们可以在进行推理时使用 \n 作为停止序列。

Case study: Product description based on a technical list of properties 案例研究：基于特性技术列表的产品描述

Here it is important to convert the input data into a natural language, which will likely lead to superior performance. For example, the following format:
在这里，将输入数据转换成自然语言是很重要的，这可能会带来上级的性能。例如，以下格式：

{"prompt":"Item=handbag, Color=army_green, price=$99, size=S->", "completion":" This stylish small green handbag will add a unique touch to your look, without costing you a fortune."}

Won’t work as well as: 效果不如：

{"prompt":"Item is a handbag. Colour is army green. Price is midrange. Size is small.->", "completion":" This stylish small green handbag will add a unique touch to your look, without costing you a fortune."}

For high performance ensure that the completions were based on the description provided. If external content is often consulted, then adding such content in an automated way would improve the performance. If the description is based on images, it may help to use an algorithm to extract a textual description of the image. Since completions are only one sentence long, we can use . as the stop sequence during inference.
为了获得高性能，请确保根据提供的描述完成。如果经常查阅外部内容，那么以自动化的方式添加这些内容将提高性能。如果描述基于图像，则使用算法来提取图像的文本描述可能是有帮助的。由于完成只有一个句子长，我们可以使用 . 作为推理过程中的停止序列。

Advanced usage 进阶应用

Customize your model name 自定义您的模型名称

You can add a suffix of up to 40 characters to your fine-tuned model name using the suffix parameter.
可使用后缀参数为微调后的模型名称添加最多40个字符的后缀。

OpenAI CLI:

openai api fine_tunes.create -t test.jsonl -m ada --suffix "custom model name"

The resulting name would be: 生成的名称为：

ada:ft-your-org:custom-model-name-2022-02-15-04-21-04

Analyzing your fine-tuned model 分析微调的模型

We attach a result file to each job once it has been completed. This results file ID will be listed when you retrieve a fine-tune, and also when you look at the events on a fine-tune. You can download these files:
我们将结果文件附加到每个作业，一旦它已经完成。当您检索微调以及查看微调上的事件时，将列出此结果文件ID。您可以下载以下文件：

OpenAI CLI:

openai api fine_tunes.results -i <YOUR_FINE_TUNE_JOB_ID>

CURL:

curl https://api.openai.com/v1/files/$RESULTS_FILE_ID/content \
  -H "Authorization: Bearer $OPENAI_API_KEY" > results.csv

The _results.csv file contains a row for each training step, where a step refers to one forward and backward pass on a batch of data. In addition to the step number, each row contains the following fields corresponding to that step:
_results.csv 文件中的每个训练步骤都对应一行，其中一个步骤指的是对一批数据的一次向前和向后传递。除步骤编号外，每行还包含与该步骤对应的以下字段：

elapsed_tokens: the number of tokens the model has seen so far (including repeats)
已用标记数：模型到目前为止看到的标记数（包括重复）
elapsed_examples: the number of examples the model has seen so far (including repeats), where one example is one element in your batch. For example, if batch_size = 4, each step will increase elapsed_examples by 4.
已用过的_示例：模型到目前为止看到的示例数（包括重复），其中一个示例是批处理中的一个元素。例如，如果为 batch_size = 4 ，则每一步将使 elapsed_examples 增加4。
training_loss: loss on the training batch
训练_损失：训练批次损失
training_sequence_accuracy: the percentage of completions in the training batch for which the model’s predicted tokens - matched the true completion tokens exactly. For example, with a batch_size of 3, if your data contains the completions [[1, 2], [0, 5], [4, 2]] and the model predicted [[1, 1], [0, 5], [4, 2]], this accuracy will be 2/3 = 0.67
训练序列准确度：训练批次中模型的预测标记与真实完成标记完全匹配的完成百分比。例如， batch_size 为3，如果数据包含补全[[1，2]，[0，5]，[4，2]]和预测模型[[1，1]，[0，5]，[4，2]]，则此精度为2/3 = 0.67
training_token_accuracy: the percentage of tokens in the training batch that were correctly predicted by the model. For example, with a batch_sizeof 3, if your data contains the completions [[1, 2], [0, 5], [4, 2]] and the model predicted [[1, 1], [0, 5], [4, 2]], this accuracy will be 5/6 = 0.83
训练标记准确度：训练批次中模型正确预测的标记的百分比。例如， batch_size为3，如果数据包含补全[[1，2]，[0，5]，[4，2]]和预测模型[[1，1]，[0，5]，[4，2]]，则此精度为5/6 = 0.83

Classification specific metrics 分类特定指标

We also provide the option of generating additional classification-specific metrics in the results file, such as accuracy and weighted F1 score. These metrics are periodically calculated against the full validation set and at the end of fine-tuning. You will see them as additional columns in your results file.
我们还提供了在结果文件中生成其他特定于分类的指标的选项，例如准确性和加权F1得分。这些度量将根据完整验证集定期计算，并在微调结束时计算。您将在结果文件中看到它们作为附加列。

To enable this, set the parameter --compute_classification_metrics. Additionally, you must provide a validation file, and set either the classification_n_classes parameter, for multiclass classification, or classification_positive_class, for binary classification.
要启用此功能，请设置参数 --compute_classification_metrics 。此外，必须提供验证文件，并设置 classification_n_classes 参数（用于多类分类）或 classification_positive_class 参数（用于二进制分类）。

OpenAI CLI:

# For multiclass classification
openai api fine_tunes.create \
  -t <TRAIN_FILE_ID_OR_PATH> \
  -v <VALIDATION_FILE_OR_PATH> \
  -m <MODEL> \
  --compute_classification_metrics \
  --classification_n_classes <N_CLASSES>

# For binary classification
openai api fine_tunes.create \
  -t <TRAIN_FILE_ID_OR_PATH> \
  -v <VALIDATION_FILE_OR_PATH> \
  -m <MODEL> \
  --compute_classification_metrics \
  --classification_n_classes 2 \
  --classification_positive_class <POSITIVE_CLASS_FROM_DATASET>

The following metrics will be displayed in your results file if you set --compute_classification_metrics:
如果设置 --compute_classification_metrics ，则结果文件中将显示以下指标：

For multiclass classification 对于多类分类

classification/accuracy: accuracy
分类/准确度：准确度
classification/weighted_f1_score: weighted F-1 score
分类/加权_f1_评分：加权F-1评分

For binary classification 对于二元分类

The following metrics are based on a classification threshold of 0.5 (i.e. when the probability is > 0.5, an example is classified as belonging to the positive class.)
以下度量基于分类阈值0.5（即，当概率〉0.5时，样本被分类为属于正类）。

classification/accuracy 分类/准确度
classification/precision 分类/精度
classification/recall 分类/召回
classification/f{beta} 分类/f{beta}
classification/auroc - AUROC 分类/AUROC - AUROC
classification/auprc - AUPRC 分类/AUPRC - AUPRC

Note that these evaluations assume that you are using text labels for classes that tokenize down to a single token, as described above. If these conditions do not hold, the numbers you get will likely be wrong.
请注意，这些计算假设您正在为标记化为单个标记的类使用文本标签，如上所述。如果这些条件不成立，你得到的数字很可能是错误的。

Validation 验证

You can reserve some of your data for validation. A validation file has exactly the same format as a train file, and your train and validation data should be mutually exclusive.
您可以保留一些数据用于验证。验证文件与训练文件具有完全相同的格式，训练和验证数据应该互斥。

If you include a validation file when creating your fine-tune job, the generated results file will include evaluations on how well the fine-tuned model performs against your validation data at periodic intervals during training.
如果在创建微调作业时包含验证文件，则生成的结果文件将包含对微调模型在训练期间定期根据验证数据执行的效果的评估。

OpenAI CLI:

openai api fine_tunes.create -t <TRAIN_FILE_ID_OR_PATH> \
  -v <VALIDATION_FILE_ID_OR_PATH> \
  -m <MODEL>

If you provided a validation file, we periodically calculate metrics on batches of validation data during training time. You will see the following additional metrics in your results file:
如果您提供了验证文件，我们会在训练期间定期计算验证数据批次的指标。您将在结果文件中看到以下附加指标：

validation_loss: loss on the validation batch
验证_丢失：验证批次的损失
validation_sequence_accuracy: the percentage of completions in the validation batch for which the model’s predicted tokens matched the true completion tokens exactly. For example, with a batch_size of 3, if your data contains the completion [[1, 2], [0, 5], [4, 2]] and the model predicted [[1, 1], [0, 5], [4, 2]], this accuracy will be 2/3 = 0.67
确认序列准确度：验证批处理中模型的预测标记与真实完成标记完全匹配的完成百分比。例如， batch_size 为3，如果数据包含完成[[1，2]，[0，5]，[4，2]]和模型预测[[1，1]，[0，5]，[4，2]]，则此精度为2/3 = 0.67
validation_token_accuracy: the percentage of tokens in the validation batch that were correctly predicted by the model. For example, with a batch_sizeof 3, if your data contains the completion [[1, 2], [0, 5], [4, 2]] and the model predicted [[1, 1], [0, 5], [4, 2]], this accuracy will be 5/6 = 0.83
验证标记准确性：模型正确预测的认证批次中标记的百分比。例如， batch_size为3，如果数据包含完成[[1，2]，[0，5]，[4，2]]和模型预测[[1，1]，[0，5]，[4，2]]，则此精度为5/6 = 0.83

Hyperparameters 超参数

We’ve picked default hyperparameters that work well across a range of use cases. The only required parameter is the training file.
我们选择了在一系列用例中工作良好的默认超参数。唯一必需的参数是训练文件。

That said, tweaking the hyperparameters used for fine-tuning can often lead to a model that produces higher quality output. In particular, you may want to configure the following:
也就是说，调整用于微调的超参数通常可以使模型产生更高质量的输出。特别是，您可能需要配置以下各项：

model: The name of the base model to fine-tune. You can select one of “ada”, “babbage”, “curie”, or “davinci”. To learn more about these models, see the Models documentation.
model ：要微调的基础模型的名称。您可以选择“ada”、“babbage”、“curie”或“davinci”之一。要了解有关这些模型的详细信息，请参阅模型文档。
n_epochs - defaults to 4. The number of epochs to train the model for. An epoch refers to one full cycle through the training dataset.
n_epochs -默认为4。训练模型的时段数。一个epoch指通过训练数据集的一个完整周期。
batch_size - defaults to ~0.2% of the number of examples in the training set, capped at 256. The batch size is the number of training examples used to train a single forward and backward pass. In general, we’ve found that larger batch sizes tend to work better for larger datasets.
batch_size -默认为训练集中示例数的~0.2%，上限为256。批量大小是用于训练单个向前和向后传递的训练示例的数量。一般来说，我们发现较大的批处理大小往往更适合较大的数据集。
learning_rate_multiplier - defaults to 0.05, 0.1, or 0.2 depending on final batch_size. The fine-tuning learning rate is the original learning rate used for pretraining multiplied by this multiplier. We recommend experimenting with values in the range 0.02 to 0.2 to see what produces the best results. Empirically, we’ve found that larger learning rates often perform better with larger batch sizes.
learning_rate_multiplier -默认为0.05、0.1或0.2，具体取决于最终的 batch_size 。微调学习速率是用于预训练的原始学习速率乘以此乘数。我们建议使用0.02到0.2范围内的值进行试验，以查看哪个值可产生最佳效果。经验上，我们发现较大的学习率通常在较大的批处理规模下表现得更好。
compute_classification_metrics - defaults to False. If True, for fine-tuning for classification tasks, computes classification-- specific metrics (accuracy, F-1 score, etc) on the validation set at the end of every epoch.
compute_classification_metrics -默认为False。如果为True，则为了微调分类任务，在每个时段结束时计算验证集的分类特定度量（准确性、F-1得分等）。

To configure these additional hyperparameters, pass them in via command line flags on the OpenAI CLI, for example:
要配置这些额外的超参数，请通过OpenAI CLI上的命令行标志将它们传入，例如：

openai api fine_tunes.create \
  -t file-JD89ePi5KMsB3Tayeli5ovfW \
  -m ada \
  --n_epochs 1

Continue fine-tuning from a fine-tuned model 从微调模型继续微调

If you have already fine-tuned a model for your task and now have additional training data that you would like to incorporate, you can continue fine-tuning from the model. This creates a model that has learned from all of the training data without having to re-train from scratch.
如果您已经针对任务微调了模型，并且现在有要合并的其他定型数据，则可以继续从该模型进行微调。这将创建一个从所有训练数据中学习的模型，而不必从头开始重新训练。

To do this, pass in the fine-tuned model name when creating a new fine-tuning job (e.g. -m curie:ft-<org>-<date>). Other training parameters do not have to be changed, however if your new training data is much smaller than your previous training data, you may find it useful to reduce learning_rate_multiplier by a factor of 2 to 4.
为此，在创建新的微调作业时传入微调后的模型名称（例如 -m curie:ft-<org>-<date> ）。其他训练参数不必更改，但是如果新的训练数据比以前的训练数据小得多，您可能会发现将 learning_rate_multiplier 减少2到4倍是有用的。

Weights & Biases 权重和偏差

You can sync your fine-tunes with Weights & Biases to track experiments, models, and datasets.
您可以将微调与权重和偏差同步，以跟踪实验、模型和数据集。

To get started, you will need a Weights & Biases account and a paid OpenAI plan. To make sure you are using the lastest version of openai and wandb, run:
要开始，您需要一个Weights & Biases账户和一个付费OpenAI计划。要确保您使用的是 openai 和 wandb 的最新版本，请运行：

pip install --upgrade openai wandb

To sync your fine-tunes with Weights & Biases, run:
若要将微调与权重与偏差同步，请运行：

openai wandb sync

You can read the Weights & Biases documentation for more information on this integration.
有关此集成的详细信息，您可以阅读权重和偏差文档。

Example notebooks notebooks格式示例

Classification 分类

finetuning-classification.ipynb

This notebook will demonstrate how to fine-tune a model that can classify whether a piece of input text is related to Baseball or Hockey. We will perform this task in four steps in the notebook:
本笔记本将演示如何微调可对一段输入文本是与Baseball相关还是与Hockey相关进行分类的模型。我们将在笔记本中分四步执行此任务：

Data exploration will give an overview of the data source and what an example looks like
数据浏览将给予数据源和示例的外观
Data preparation will turn our data source into a jsonl file that can be used for fine-tuning
数据准备将把我们的数据源转换成一个jsonl文件，可以用来进行微调
Fine-tuning will kick off the fine-tuning job and explain the resulting model’s performance
微调将启动微调作业并解释生成的模型的性能
Using the model will demonstrate making requests to the fine-tuned model to get predictions.
使用该模型将演示如何向经过微调的模型发出请求以获得预测。

Question answering 问答

olympics-1-collect-data.ipynb

olympics-2-create-qa.ipynb

olympics-3-train-qa.ipynb

The idea of this project is to create a question answering model, based on a few paragraphs of provided text. Base GPT-3 models do a good job at answering questions when the answer is contained within the paragraph, however if the answer isn’t contained, the base models tend to try their best to answer anyway, often leading to confabulated answers.
这个项目的想法是创建一个问答模型，基于提供的文本的几个段落。当答案包含在段落中时，基本GPT-3模型在回答问题方面做得很好，但是如果答案不包含在段落中，基本模型倾向于尽最大努力回答，通常导致虚构的答案。

To create a model which answers questions only if there is sufficient context for doing so, we first create a dataset of questions and answers based on paragraphs of text. In order to train the model to answer only when the answer is present, we also add adversarial examples, where the question doesn’t match the context. In those cases, we ask the model to output “No sufficient context for answering the question”.
要创建一个模型，使其仅在有足够上下文的情况下才回答问题，我们首先基于文本段落创建一个问题和答案的数据集。为了训练模型仅在答案存在时才回答，我们还添加了对抗性示例，其中问题与上下文不匹配。在这些情况下，我们要求模型输出“没有足够的上下文来回答问题”。

We will perform this task in three notebooks:
我们将在三个笔记本中执行此任务：

The first notebook focuses on collecting recent data, which GPT-3 didn’t see during it’s pre-training. We picked the topic of Olympic Games 2020 (which actually took place in the summer of 2021), and downloaded 713 unique pages. We organized the dataset by individual sections, which will serve as context for asking and answering the questions.
第一个笔记本集中于收集最近的数据，GPT-3在预训练期间没有看到这些数据。我们选择了2020年奥运游戏的主题（实际上发生在2021年夏天），并下载了713个独立页面。我们将数据集组织成单独的部分，这些部分将作为提问和回答问题的背景。
The second notebook will utilize Davinci-instruct to ask a few questions based on a Wikipedia section, as well as answer those questions, based on that section.
第二个笔记本将利用Davinci-instruct根据维基百科的一个部分提出一些问题，并根据该部分回答这些问题。
The third notebook will utilize the dataset of context, question and answer pairs to additionally create adversarial questions and context pairs, where the question was not generated on that context. In those cases the model will be prompted to answer “No sufficient context for answering the question”. We will also train a discriminator model, which predicts whether the question can be answered based on the context or not.
第三个笔记本将利用上下文、问题和答案对的数据集来另外创建对抗性问题和上下文对，其中问题不是在该上下文中生成的。在这些情况下，模型将被提示回答“没有足够的上下文来回答问题”。我们还将训练一个鉴别器模型，它可以根据上下文来预测问题是否可以被回答。

其它资料下载

如果大家想继续了解人工智能相关学习路线和知识体系，欢迎大家翻阅我的另外一篇博客《重磅 | 完备的人工智能AI 学习——基础知识学习路线，所有资料免关注免套路直接网盘下载》
这篇博客参考了Github知名开源平台，AI技术平台以及相关领域专家：Datawhale，ApacheCN，AI有道和黄海广博士等约有近100G相关资料，希望能帮助到所有小伙伴们。