OpenAI开放gpt-3.5turbo微调fine-tuning测试教程

news2026/2/15 11:20:40

文章目录

- - - openai微调 fine-tuning介绍
    - openai微调地址
    - jsonl格式数据集准备
    - 点击上传文件

openai微调 fine-tuning介绍

openai微调地址

网址：https://platform.openai.com/finetune

在这里插入图片描述

jsonl格式数据集准备

使用Chinese-medical-dialogue-data数据集
git clone进行下载

git clone https://github.com/Toyhom/Chinese-medical-dialogue-data

选择其中心血管科中的部分数据进行微调
微调需要进行付费，token越多收费越多，并且gpt-3.5-turbo的token数最多为4096
dataframe导入csv文件

import pandas as pd

df = pd.read_csv('Chinese-medical-dialogue-data/样例_内科5000-6000.csv',encoding='gbk')

df

提取样本

train_data = df[df['department']=='心血管科'].iloc[0:50,:]
valid_data = df[df['department']=='心血管科'].iloc[50:70,:]

train_data

jsonl格式数据构建

lis1 = []
lis2 = []
sys_content = "You are a specialist in cardiovascular disease and you will apply your expertise to give your specialized answers to patients."

for index,row in train_data.iterrows():
    each = []
    each.append({"role":"system","content":sys_content})
    each.append({"role":"user","content":row['ask']})
    each.append({"role":"assistant","content":row['answer']})
    #print(each)
    lis1.append(each)

for index,row in valid_data.iterrows():
    each = []
    each.append({"role":"system","content":sys_content})
    each.append({"role":"user","content":row['ask']})
    each.append({"role":"assistant","content":row['answer']})
    #print(each)
    lis2.append(each)

lis1

jsonl数据导出

lis1 = []
lis2 = []
sys_content = "You are a specialist in cardiovascular disease and you will apply your expertise to give your specialized answers to patients."

for index,row in train_data.iterrows():
    each = []
    each.append({"role":"system","content":sys_content})
    each.append({"role":"user","content":row['ask']})
    each.append({"role":"assistant","content":row['answer']})
    #print(each)
    lis1.append(each)

for index,row in valid_data.iterrows():
    each = []
    each.append({"role":"system","content":sys_content})
    each.append({"role":"user","content":row['ask']})
    each.append({"role":"assistant","content":row['answer']})
    #print(each)
    lis2.append(each)

lis1