【深度学习】微调Qwen1.8B

1.前言

使用地址数据微调Qwen1.8B。Qwen提供了预构建的Docker镜像，在使用时获取镜像只需安装驱动、下载模型文件即可启动Demo、部署OpenAI API以及进行微调。

github地址：GitHub - QwenLM/Qwen: The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

镜像地址：https://hub.docker.com/r/qwenllm/qwen/tags

获取方式：docker pull qwenllm/qwen:cu117

2.微调过程

Qwen的介绍中给出了详细的微调教程。可访问千问github中的微调章节查看。如果使用预先准备的docker，微调则更为方便。

2.1 准备镜像

需要注意的是：在官方提供的docker镜像中，运行docker run 镜像成为容器后，会启动镜像中的python服务。Dockerfile中最后一行命令为"CMD ["python3" "web_demo.py" "--server-port" "80"]。我们在使用容器微调时，不需要让容器中开启服务，所以需要以官方提供的镜像为基础，再做一个镜像。 Dockerfile内容如下。

FROM qwenllm/qwen:cu117

# CMD 设置为一个空命令，这样可以在容器启动时不执行任何操作
CMD ["/bin/sh", "-c", "tail -f /dev/null"]

#创建镜像
docker build -t qwenllm/qwen:cu117_V1 .

2.2 准备数据

首先，需要准备训练数据。需要将所有样本放到一个列表中并存入json文件中。每个样本对应一个字典，包含id和conversation，其中后者为一个列表。示例如下所示：

[
  {
    "id": "identity_0",
    "conversations": [
      {
        "from": "user",
        "value": "你好"
      },
      {
        "from": "assistant",
        "value": "我是一个语言模型，我叫通义千问。"
      }
    ]
  }
]

本次微调的训练为指令微调，数据示例如下：

[
    {
        "id": "identity_0",
        "conversations": [
            {
                "from": "user",
                "value": "识别以下句子中的地址信息，并按照{address:['地址']}的格式返回。如果没有地址，返回{address:[]}。句子为：在一本关于人文的杂志中，我们发现了一篇介绍北京市海淀区科学院南路76号社区服务中心一层的文章，文章深入探讨了该地点的人文历史背景以及其对于当地居民的影响。"
            },
            {
                "from": "assistant",
                "value": "{\"address\":\"北京市海淀区科学院南路76号社区服务中心一层\"}"
            }
        ]
    },
    {
        "id": "identity_1",
        "conversations": [
            {
                "from": "user",
                "value": "识别以下句子中的地址信息，并按照{address:['地址']}的格式返回。如果没有地址，返回{address:[]}。句子为：近日，位于北京市房山区政通路13号的某儿童教育机构因出色的育儿理念和创新教学方法引起了广泛关注。"
            },
            {
                "from": "assistant",
                "value": "{\"address\":\"北京市房山区政通路13号\"}"
            }
        ]
    },
    {
        "id": "identity_2",
        "conversations": [
            {
                "from": "user",
                "value": "识别以下句子中的地址信息，并按照{address:['地址']}的格式返回。如果没有地址，返回{address:[]}。句子为：在军事领域中，位于朝阳区惠民园4号楼底商的某单位，一直致力于各种研究和发展工作，以保障国家安全和稳定。"
            },
            {
                "from": "assistant",
                "value": "{\"address\":\"朝阳区惠民园4号楼底商\"}"
            }
        ]
    }
]

2.3 微调

2.3.1 微调方法1

我们借助Qwen给出的docker进行微调。图为使用docker进行微调的示例。

IMAGE_NAME=qwenllm/qwen:cu117
CHECKPOINT_PATH='/ssd/dongzhenheng/LLM/Qwen-1_8B-Chat'               # 下载的模型和代码路径
#CHECKPOINT_PATH=/path/to/Qwen-7B-Chat-Int4     # 下载的模型和代码路径 (Q-LoRA)
DATA_PATH='/data/zhenhengdong/WORk/Fine-tuning/Codes/'                   # 准备微调数据放在 ${DATA_PATH}/example.json #data.json
OUTPUT_PATH='/ssd/dongzhenheng/LLM/Qwen-Address/tezt'          # 微调输出路径

# 默认使用主机所有GPU
DEVICE=all
# 如果需要指定用于训练的GPU，按照以下方式设置device（注意：内层的引号不可省略）
#DEVICE='"device=0,3"'

mkdir -p ${OUTPUT_PATH}

# 单卡LoRA微调
docker run --gpus ${DEVICE} --rm --name qwen \
    --mount type=bind,source=${CHECKPOINT_PATH},target=/data/shared/Qwen/Qwen-7B \
    --mount type=bind,source=${DATA_PATH},target=/data/shared/Qwen/data \
    --mount type=bind,source=${OUTPUT_PATH},target=/data/shared/Qwen/output_qwen \
    --shm-size=2gb \
    -it ${IMAGE_NAME} \
    bash finetune/finetune_lora_single_gpu.sh -m /data/shared/Qwen/Qwen-7B/ -d /data/shared/Qwen/data/data.json
    #bash finetune/finetune_lora_ds.sh -m /data/shared/Qwen/Qwen-7B/ -d /data/shared/Qwen/data/data.json
    #bash finetune/finetune_lora_ds.sh -m /data/shared/Qwen/Qwen-7B/ -d /data/shared/Qwen/data/data.json
    #bash finetune/finetune_lora_ds.sh -m /data/shared/Qwen/Qwen-7B/ -d /data/shared/Qwen/data/data.json

微调时间较长，运行时可以新启动一个screen。

微调结束后会在指定的输出目录下输出adapter的相关文件。

2.3.2 微调方法2

微调方法1中直接运行了finetune/finetune_lora_single_gpu.sh脚本进行微调。finetune_lora_single_gpu.sh中的内容如下。

#!/bin/bash
export CUDA_DEVICE_MAX_CONNECTIONS=1

MODEL="your model path" # Set the path if you do not want to load from huggingface directly
# ATTENTION: specify the path to your training data, which should be a json file consisting of a list of conversations.
# See the section for finetuning in README for more information.
DATA="your data path"

function usage() {
    echo '
Usage: bash finetune/finetune_lora_single_gpu.sh [-m MODEL_PATH] [-d DATA_PATH]
'
}

while [[ "$1" != "" ]]; do
    case $1 in
        -m | --model )
            shift
            MODEL=$1
            ;;
        -d | --data )
            shift
            DATA=$1
            ;;
        -h | --help )
            usage
            exit 0
            ;;
        * )
            echo "Unknown argument ${1}"
            exit 1
            ;;
    esac
    shift
done

export CUDA_VISIBLE_DEVICES=3

python finetune.py \
  --model_name_or_path $MODEL \
  --data_path $DATA \
  --bf16 True \
  --output_dir output_qwen \
  --num_train_epochs 50\
  --per_device_train_batch_size 2 \
  --per_device_eval_batch_size 1 \
  --gradient_accumulation_steps 8 \
  --evaluation_strategy "no" \
  --save_strategy "steps" \
  --save_steps 1000 \
  --save_total_limit 5 \
  --learning_rate 3e-4 \
  --weight_decay 0.1 \
  --adam_beta2 0.95 \
  --warmup_ratio 0.01 \
  --lr_scheduler_type "cosine" \
  --logging_steps 1 \
  --report_to "none" \
  --model_max_length 512 \
  --lazy_preprocess True \
  --gradient_checkpointing \
  --use_lora

# If you use fp16 instead of bf16, you should use deepspeed
# --fp16 True --deepspeed finetune/ds_config_zero2.json

各个参数的解释如下：

python finetune.py: 运行脚本finetune.py，用于微调模型。

--model_name_or_path $MODEL: 指定预训练模型的名称或路径。$MODEL是一个变量，将在运行时替换为实际的模型路径。

--data_path $DATA: 指定向量化的训练数据集路径。$DATA是一个变量，将替换为实际的数据集路径。

--bf16 True: 使用BF16（Brain Floating Point 16）精度进行训练，以降低内存消耗和加速计算。

--output_dir output_qwen: 指定模型训练完成后输出文件的目录名。

--num_train_epochs 50: 设置训练轮数为50轮。

--per_device_train_batch_size 2: 每个设备上的训练批次大小为2，即每次在每个GPU上处理2个样本。

--per_device_eval_batch_size 1: 每个设备上的评估批次大小为1。

 --gradient_accumulation_steps 8: 累积8个步骤的梯度后再更新权重，等效于增大了训练批次大小。

 --evaluation_strategy "no": 不在训练过程中执行评估。

 --save_strategy "steps": 根据步数保存模型，而不是按时间间隔保存。

 --save_steps 1000: 每训练1000个步后保存一次模型。

 --save_total_limit 5: 限制最多保存最近的5个模型检查点。

 --learning_rate 3e-4：指定学习率（learning rate），这是优化器更新权重时使用的步长。值为3乘以10的负4次方，意味着训练过程中的学习速率相对较小，有助于更精细地调整模型参数。

 --weight_decay 0.1：L2正则化系数，也称为权重衰减（weight decay）。它用于防止模型过拟合，通过对权重矩阵施加惩罚来约束模型复杂度。

 --adam_beta2 0.95：Adam优化器中的第二个动量项的指数衰减率（beta2）。Adam是一种常用的优化算法，该参数影响了历史梯度平方项的累积速度。

 --warmup_ratio 0.01：学习率预热比例，用于学习率调度器中。这意味着在训练开始阶段会有一个学习率逐渐增大的“预热”阶段，其长度占整个训练周期的1%。

 --lr_scheduler_type "cosine"：学习率调度器类型，这里使用的是余弦退火（Cosine Annealing）策略。在训练过程中，学习率会按照余弦函数的变化规律进行动态调整。

 --logging_steps 1：指定每多少个步骤输出一次日志信息。设置为1意味着每次迭代（通常是每个训练批次后）都会记录训练状态或指标。

 --report_to "none"：设置训练结果报告的位置或方式。"none"意味着不向任何工具或平台报告进度或指标。

 --model_max_length 512：定义模型处理的最大序列长度。这意味着输入数据将被截断或者填充到不超过512个token。

 --lazy_preprocess True：如果支持的话，启用延迟预处理模式。在这种模式下，数据预处理将在需要时而不是一次性全部完成，可以减少内存占用。

 --gradient_checkpointing：启用梯度检查点功能，这是一种内存优化技术，通过存储和恢复中间层的激活以节省内存，特别适用于大模型训练。

 --use_lora：表示在微调过程中应用LoRA（Low-Rank Adaptation）方法，这是一种针对大规模模型参数高效的微调技术，通过引入低秩适配器来更新模型权重，而不直接修改所有参数，从而减少计算资源消耗。

如果想更改finetune.py的参数，那就需要进入进入容器中微调。可以在容器外修改finetune_lora_single_gpu.sh文件中的参数，比如训练的轮次、训练的批次、训练多少步后保存模型、模型处理的最大序列长度等。修改后将文件cp到容器中。具体过程如下：

#启动docker
docker run --gpus all -v /ssd/dongzhenheng/LLM/Qwen-14B-Chat:/data/shared/Qwen/Qwen-Chat/ -d qwenllm/qwen:cu121_V1
#查看容器id
docker ps
#根据容器id进入容器
docker exec -it 容器id bash
#cp 进去data
docker cp /data/zhenhengdong/WORk/Fine-tuning/Qwen-14B/Datasets/data_1.json b81df07711cf:/data/shared/Qwen
#修改checkpoint.py,注释掉waring。避免在运行的时候出现很长的wraing
docker cp /data/zhenhengdong/WORk/Fine-tuning/Qwen-14B/Codes/checkpoint.py b81df07711cf:/usr/local/lib/python3.8/dist-packages/torch/utils
#修改finetune_lora_single_gpu.sh 中的参数
docker cp /data/zhenhengdong/WORk/Fine-tuning/Qwen-14B/Codes/finetune_lora_single_gpu_1.sh  b81df07711cf:/data/shared/Qwen/finetune
# 运行 因为在finetune_lora_single_gpu.sh 中指定了模型地址和数据地址，所以在运行的时候也可以不用指定 -d 和 -m了
bash finetune/finetune_lora_single_gpu_1.sh  -m /data/shared/Qwen/Qwen-Chat -d /data/shared/Qwen/data_1.json

微调结束后，会生成output_qwen的文件夹。

里面是模型微调时保存的模型。

微调过程如下：

微调结束后，可以选择checkpoint复制到容器外，再调用或者merge。

2.4 调用

微调之后的调用，Qwen也给出了详细示例。

在调用时需要注意更换一下adapert_config.json中的模型路径。将base_model_name_or_path换成自己微调的模型路径。

from peft import AutoPeftModelForCausalLM
path_to_adapter = '/ssd/dongzhenheng/LLM/Qwen-Address/tezt'
model = AutoPeftModelForCausalLM.from_pretrained(
    path_to_adapter, # path to the output directory
    device_map="cuda:0",
    trust_remote_code=True
).eval()

2.5 合并

合并以docker中的为例

from peft import AutoPeftModelForCausalLM

path_to_adapter = '/data/shared/Qwen/output_qwen'
model = AutoPeftModelForCausalLM.from_pretrained(
    path_to_adapter, # path to the output directory
    device_map="auto",
    trust_remote_code=True
).eval()

new_model_directory = '/data/shared/Qwen/New_model_directory'
merged_model = model.merge_and_unload()
# max_shard_size and safe serialization are not necessary. 
# They respectively work for sharding checkpoint and save the model to safetensors
merged_model.save_pretrained(new_model_directory, max_shard_size="2048MB", safe_serialization=True)

执行过程

目录文件

New_model_directory目录将包含合并后的模型参数与相关模型代码。请注意*.cu和*.cpp文件没被保存，需要手动从Qwen-1_8B-Chat中复制。merge_and_unload仅保存模型，并未保存tokenizer，如有需要，请复制相关文件或使用以以下代码保存。

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
    path_to_adapter, # path to the output directory
    trust_remote_code=True
)
tokenizer.save_pretrained(New_model_directory)

2.6 合并调用

合并并存储模型后即可用常规方式读取新模型。

checkpoint_path = '/data/shared/Qwen/New_model_directory'
tokenizer = AutoTokenizer.from_pretrained(
        checkpoint_path, 
        trust_remote_code=True, resume_download=True,
    )
device_map = "auto"

model = AutoModelForCausalLM.from_pretrained(
        checkpoint_path,
        device_map=device_map,
        trust_remote_code=True,
        resume_download=True,
        bf16=True
    ).eval()

model.generation_config = GenerationConfig.from_pretrained(
        checkpoint_path, trust_remote_code=True, resume_download=True,
    )