MiniGPT-4 模型学习与实战

1 前言

MiniGPT-4 是一个冻结的视觉编码器(Q-Former&ViT)与一个冻结的文本生成大模型（Vicuna，江湖人称：小羊驼）进行对齐造出来的。

MiniGPT-4 具有许多类似于 GPT-4 的能力, 图像描述生成、从手写草稿创建网站等
MiniGPT-4 还能根据图像创作故事和诗歌，为图像中显示的问题提供解决方案，教用户如何根据食物照片做饭等。

2 模型介绍

2.1 模型结构介绍

投影层（Projection Layer）是神经网络中常见层类型，将输入数据从一个空间映射到另一个空间。
NLP中，投影层通常用于将高维词向量映射到低维空间，以减少模型参数数量和计算量。
CV中，投影层可以将高维图像特征向量映射到低维空间，以便于后续处理和分析。

在这里插入图片描述

2.2 fine tune 介绍

先是在 4 个 A100 上用 500 万图文对训练
然后再用一个小的高质量数据集训练，单卡 A100 训练只需要 7 分钟。

2.3 模型效果介绍

在零样本 VQAv2 上，BLIP-2 相较于 80 亿参数的 Flamingo 模型，使用的可训练参数数量少了 54 倍，性能提升了 8.7 %。

3 环境搭建

3.1 下载代码

git clone https://github.com/Vision-CAIR/MiniGPT-4.git

3.2 构建环境

cd MiniGPT-4
conda env create -f environment.yml
conda activate minigpt4

4 MiniGPT-4 模型下载

参考：How to Prepare Vicuna Weight
1、下载 Vicuna Weight；
2、下载原始LLAMA-7B或LLAMA-13B权重；
3、构建真正的 working weight
4、配置模型路径：MiniGPT-4/minigpt4/configs/models/minigpt4.yaml第16行，将 “/path/to/vicuna/weights/” 修改为本地weight地址

4.1 下载 Vicuna Weight

当前版本的MiniGPT-4是建立在v0版本的 Vicuna-13B 之上的。请参考我们的说明来准备 Vicuna weights。最终的权重将在结构类似于以下的单个文件夹中:

git clone https://huggingface.co/lmsys/vicuna-13b-delta-v1.1
# or
git clone https://huggingface.co/lmsys/vicuna-7b-delta-v1.1

请注意，这不是直接的 working weight ，而是LLAMA-13B的 working weight 与 original weight 的差值。(由于LLAMA的规则，我们无法分配LLAMA的 weight 。

4.2 下载 LLAMA Weight

git clone https://huggingface.co/decapoda-research/llama-13b-hf  # more powerful, need at least 24G gpu memory
# or
git clone https://huggingface.co/decapoda-research/llama-7b-hf  # smaller, need 12G gpu memory

量力而行⬆️上面是官方教程给的，但是7b的权重文件和vicuna-delta的7b对不上
📢注意：LLAMA的权重用这个更好：
llama-7b

4.3 构建真正的 working weight

当这两个 weight 备好后，我们可以使用Vicuna团队的工具来创建真正的 working weight 。首先，安装与v0 Vicuna兼容的库

pip install git+https://github.com/lm-sys/FastChat.git@v0.1.10

执行如下命令创建最终 working weight：

python -m fastchat.model.apply_delta --base /path/to/llama-13bOR7b-hf/  --target /path/to/save/working/vicuna/weight/  --delta /path/to/vicuna-13bOR7b-delta-v1.1/ --low-cpu-mem
>>>
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. 
The class this function is called from is 'LlamaTokenizer'.
Split files for the base model to /tmp/tmptu2g17_d
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [01:47<00:00,  3.26s/it]
Split files for the delta model to /tmp/tmpol8jc2oy
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [01:03<00:00, 31.92s/it]
Applying the delta
33it [02:09,  3.91s/it]
Saving the target model to vicuna/weight/

注：低CPU内存需加入–low-cpu-mem，可以把大的权重文件分割成多个小份，并使用磁盘作为临时存储。可以使峰值内存保持在16GB以下。不然无法载入vicuna增量文件，CPU内存占满，程序直接被kill，

output

config.json           pytorch_model-16.bin  pytorch_model-23.bin  pytorch_model-30.bin  pytorch_model-8.bin
pytorch_model-0.bin   pytorch_model-17.bin  pytorch_model-24.bin  pytorch_model-31.bin  pytorch_model-9.bin
pytorch_model-10.bin  pytorch_model-18.bin  pytorch_model-25.bin  pytorch_model-32.bin  pytorch_model.bin.index.json
pytorch_model-11.bin  pytorch_model-19.bin  pytorch_model-26.bin  pytorch_model-3.bin   special_tokens_map.json
pytorch_model-12.bin  pytorch_model-1.bin   pytorch_model-27.bin  pytorch_model-4.bin   tokenizer_config.json
pytorch_model-13.bin  pytorch_model-20.bin  pytorch_model-28.bin  pytorch_model-5.bin   tokenizer.model
pytorch_model-14.bin  pytorch_model-21.bin  pytorch_model-29.bin  pytorch_model-6.bin
pytorch_model-15.bin  pytorch_model-22.bin  pytorch_model-2.bin   pytorch_model-7.bin

4.4 配置模型路径

# Vicuna
llama_model: "chat/vicuna/weight"   # 将 "/path/to/vicuna/weights/"  修改为本地 weight 地址

5 Prepare the pretrained MiniGPT-4 checkpoint

5.1 下载 MiniGPT-4 checkpoint

方法一：从 google drive 下载
- Checkpoint Aligned with Vicuna 13B: https://drive.google.com/file/d/1a4zLvaiDBr-36pasffmgpvH5P7CKmpze/view?usp=share_link
- Checkpoint Aligned with Vicuna 7B: https://drive.google.com/file/d/1RY9jV0dyqLX-o38LrumkKRh6Jtaop58R/view?usp=sharing
方法二：huggingface 平台下载
- prerained_minigpt4_7b.pth：https://www.huggingface.co/wangrongsheng/MiniGPT4-7B/tree/main
- pretrained_minigpt4.pth：https://www.huggingface.co/wangrongsheng/MiniGPT4/tree/main

git lfs install
git clone https://www.huggingface.co/wangrongsheng/MiniGPT4-7B

5.2 在 eval_configs/minigpt4_eval.yaml 的第11行设置 MiniGPT-4 checkpoint 路径

    model:
    arch: mini_gpt4
    model_type: pretrain_vicuna
    freeze_vit: True
    freeze_qformer: True
    max_txt_len: 160
    end_sym: "###"
    low_resource: True
    prompt_path: "prompts/alignment.txt"
    prompt_template: '###Human: {} ###Assistant: '
    ckpt: '/path/to/pretrained/ckpt/'       # 修改为 MiniGPT-4 checkpoint 路径
    ...

5.3 在本地启动 MiniGPT-4 demo

本地通过以下命令 demo.py 运行 MiniGPT-4 demo

python demo.py --cfg-path eval_configs/minigpt4_eval.yaml  --gpu-id 0

注：为了节省GPU内存，Vicuna默认加载为8位，波束搜索宽度为1。这种配置对于Vicuna 13B需要大约23G GPU内存，对于Vicuna7B需要大约11.5G GPU内存。对于更强大的GPU，您可以通过在配置文件minigpt4_eval.yaml中将low_resource设置为False以16位运行模型，并使用更大的波束搜索宽度。

5.4 训练 MiniGPT-4

MiniGPT-4的训练包含两个 alignment stages.
MiniGPT-4 —— First pretraining stage

在第一个预训练阶段，使用 Laion和CC数据集的图像-文本对来训练模型，以对齐视觉和语言模型。要下载和准备数据集，请查看我们的第一阶段数据集准备说明。在第一阶段之后，视觉特征被映射，并且可以被语言模型理解。要启动第一阶段培训，请运行以下命令。在我们的实验中，我们使用了4个A100。您可以在配置文件 train_configs/minigpt4_stage1_pretrain.yaml 中更改保存路径

torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/minigpt4_stage1_pretrain.yaml

rain_configs/minigpt4_stage1_pretrain.yaml 介绍

    model:
    arch: mini_gpt4
    model_type: pretrain_vicuna
    freeze_vit: True
    freeze_qformer: True

    datasets:
    laion:
        vis_processor:
        train:
            name: "blip2_image_train"
            image_size: 224
        text_processor:
        train:
            name: "blip_caption"
        sample_ratio: 115
    cc_sbu:
        vis_processor:
            train:
            name: "blip2_image_train"
            image_size: 224
        text_processor:
            train:
            name: "blip_caption"
        sample_ratio: 14

    run:
    task: image_text_pretrain
    # optimizer
    lr_sched: "linear_warmup_cosine_lr"
    init_lr: 1e-4
    min_lr: 8e-5
    warmup_lr: 1e-6

    weight_decay: 0.05
    max_epoch: 4
    batch_size_train: 64
    batch_size_eval: 64
    num_workers: 4
    warmup_steps: 5000
    iters_per_epoch: 5000

    seed: 42
    output_dir: "output/minigpt4_stage1_pretrain"

    amp: True
    resume_ckpt_path: null

    evaluate: False 
    train_splits: ["train"]

    device: "cuda"
    world_size: 1
    dist_url: "env://"
    distributed: True

只有第一阶段训练的MiniGPT-4 checkpoint 可以在这里下载。与第二阶段之后的模型相比，该 checkpoint 频繁地生成不完整和重复的句子。

MiniGPT-4 —— Second finetuning stage

在第二阶段，我们使用自己创建的小型高质量图像-文本对数据集，并将其转换为对话格式，以进一步对齐MiniGPT-4。要下载和准备我们的第二阶段数据集，请查看我们的 second stage dataset preparation instruction。

要启动第二阶段对齐，首先在 train_configs/minigpt4_stage1_pretrain.yaml 中指定阶段1中训练的 checkpoint 文件的路径。您也可以在那里指定输出路径。然后，运行以下命令。在我们的实验中，我们使用1 A100。

 torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/minigpt4_stage2_finetune.yaml

踩坑手册

error: RPC failed； curl 28 OpenSSL SSL_read: Connection was reset, errno 10054
ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.

MiniGPT-4 本地部署 RTX 3090
LLaMATokenizer does not exist or is not currently imported- LLaMA 4-bit

打开fastchat.model.apply_delta.py
使用文本替换，将所有的
- AutoTokenizer 替换为 LlamaTokenizer
- AutoModelForCausalLM 替换为 LlamaForCausalLM
- 保存
重新运行上面的命令即可。

如果你的CPU内存不足，您也可以尝试通过这些方法来减少权重转换对 CPU 内存的要求

方案一：将 --low-cpu-mem 追加到上面的命令中，这会将大权重文件拆分为较小的文件，并将磁盘用作临时存储。这可以将峰值内存保持在 16GB 以下；
- python -m fastchat.model.apply_delta --base C:\Users\admin\wws\LLMS\Vicuna\llama-7b-hf --target C:\Users\admin\wws\LLMS\Vicuna\vicuna-7b-weight --delta C:\Users\admin\wws\LLMS\Vicuna\vicuna-7b-delta-v1.1 --low-cpu-mem
方案二：创建一个大的交换文件并依靠操作系统自动的将磁盘当作虚拟内存。

tensor尺度不一致

bug：tensor尺度不一致

RuntimeError: The size of tensor a (32000) must match the size of tensor b (32001) at non-singleton dimension 0

当使用v0版本时，生成vicuna权重出错（bug：tensor尺度不一致），而换为v1.1版本即可解决。

在第二阶段对齐后，MiniGPT-4能够连贯地谈论图像并且用户友好。

参考

【LLMs 入门实战 —— 八】MiniGPT-4 模型学习与实战

MiniGPT-4 模型学习
【LLMs 入门实战】第二式
论文：《MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models》
Vision-CAIR/MiniGPT-4
Vision-CAIR/MiniGPT-4/blob/main/PrepareVicuna.md
MiniGPT-4｜图像对话模型
lm-sys/FastChat
lmsys/vicuna-7b-delta-v1.1
小羊驼模型(FastChat-vicuna)运行踩坑记录
大模型也内卷，Vicuna训练及推理指南，效果碾压斯坦福羊驼
MiniGPT-4 本地部署 RTX 3090 （bug：默认conda装的环境torch不带cuda，手动pip 装了 1.13.1 和cuda 117 解决了）
MiniGPT-4，开源了！
Vicuna 模型学习与实战