LLM - 自定义图像数据集使用 LoRA 微调图像生成 Flux 模型

news2025/10/19 20:49:41

欢迎关注我的CSDN：https://spike.blog.csdn.net/
本文地址：https://spike.blog.csdn.net/article/details/141638928

免责声明：本文来源于个人知识与公开资料，仅用于学术交流，欢迎讨论，不支持转载。

LoRA

在 Diffusion 图像生成框架中，使用 LoRA（Low-Rank Adaptation）微调，难点在于，需要精确控制模型参数的更新以避免破坏预训练模型的知识，同时保持生成图像的多样性和质量，这涉及到复杂的优化策略和计算资源的高效利用，以及在保持模型泛化能力的同时实现特定任务的微调，这通常需要大量的实验和调参来找到最佳的低秩矩阵和学习率，以确保模型在特定数据集上的性能提升。

LoRA 训练框架，参考 AI Toolkit by Ostris，配置开发环境：

git clone https://github.com/ostris/ai-toolkit
cd ai-toolkit
git submodule update --init --recursive
conda create -n ai-toolkit python=3.9
conda activate ai-toolkit
pip3 install torch==2.4.0 torchvision torchaudio -i https://pypi.tuna.tsinghua.edu.cn/simple

测试 PyTorch 是否安装成功，即：

python

import torch
print(torch.__version__)  # 2.4.0
print(torch.cuda.is_available())  # True
exit()

再安装其他依赖的 Python 库：

pip install -r requirements.txt

配置训练文件：

cd ai-toolkit/config/examples/
cp train_lora_flux_24gb.yaml ../.

重要参数包括 LoRA 相关参数、batch_size、lr (learning rate)、精度使用float16，即：

network:
  type: "lora"
  linear: 16
  linear_alpha: 16
train:
  batch_size: 1
  steps: 2000  # total number of steps to train 500 - 4000 is a good range
  gradient_accumulation_steps: 1
  lr: 1e-4
save:
  dtype: float16 # precision to save

修改配置文件 train_lora_flux_24gb.yaml 的相关路径：

folder_path：图像数据集，包括图像，以及同名的描述 (caption)，数据集准备使用 Joy Caption 模型，参考
name_or_path：基础的 Diffusion 模型，使用 FLUX.1-dev

folder_path: "joy-caption-pre-alpha/image_datasets/yky_ori_dataset"
name_or_path: "FLUX.1-dev"

在图像数据集的描述中，第一行，加入，This is a photo of a girl named [xxx]. 强化关键词，大约 40 张图像，以及描述。

单卡运行训练脚本：

CUDA_VISIBLE_DEVICES=1 nohup python -u run.py config/train_lora_flux_24gb.yaml > nohup.out &

关于显存占用：batch_size=1 是 22892M、batch_size=4 是 32076M。如果运行错误，参考 Bug1 的解决方案。

输出文件夹，my_first_flux_lora_v1.safetensors 即最终的模型，训练 2000 steps，即：

├── [1.8K]  config.yaml
├── [164M]  my_first_flux_lora_v1.safetensors
├── [164M]  my_first_flux_lora_v1_000001000.safetensors
├── [164M]  my_first_flux_lora_v1_000001250.safetensors
├── [164M]  my_first_flux_lora_v1_000001500.safetensors
├── [164M]  my_first_flux_lora_v1_000001750.safetensors
├── [165M]  optimizer.pt
└── [4.0K]  samples

samples 是模型在训练过程中的图像变化，提示词来源于配置中的 sample/prompts 字段：

在 ComfyUI 中，调用 Flux + LoRA 的效果，使用不同的 LoRA 模型权重，效果如下：
LoRA

参考：在 ComfyUI 中配置 Flux + LoRA 的组合流程优化图像属性

Bug1:

TypeError: unsupported operand type(s) for |: 'torch._C._TensorMeta' and 'NoneType'

解决：修改 ai-toolkit/toolkit/config_modules.py 文件，需要修改 2 处，即：

Union[torch.Tensor, None]  # Union[torch.Tensor | None]

参考：Error for Flux Dev Lora => TypeError: unsupported operand type(s) for |: ‘torch._C._TensorMeta’ and ‘NoneType’

Bug2：

ai-toolkit/lib/python3.9/site-packages/controlnet_aux/mediapipe_face/mediapipe_face_common.py:7: UserWarning: The module 'mediapipe' is not installed. The package will have limited functionality. Please install it using the command: pip install 'mediapipe'

解决：安装 mediapipe 包，即可。

pip install mediapipe -i https://pypi.tuna.tsinghua.edu.cn/simple

Bug3:

ontrolnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_5m_224 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_5m_224. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
  return register_model(fn_wrapper)

解决：降低 timm 包的版本，即可。

pip show timm
pip install timm==0.9.10

其他 Flux 的 LoRA，参考 XLabs AI: https://huggingface.co/XLabs-AI

XLabs AI is a part of an international company, a product laboratory where we strive to become leaders in machine learning and neural networks. The company develops and implements revolutionary solutions, setting new standards and inspiring to achieve the impossible in the field of information technology. Our team is an open, energized, and young collective that welcomes innovative ideas and supports the initiative and creativity of our employees.

XLabs AI 是一家国际公司的一部分，也是一个产品实验室，我们努力成为机器学习和神经网络领域的领导者。公司开发并实施革命性的解决方案，树立新的标准，并且激励人们在信息技术领域实现不可能。

LoRA: https://huggingface.co/XLabs-AI/flux-lora-collection/tree/main，LoRA 模型大小是 22.4M、44.8M，最大是 359 M，主要是 Rank (置) 相关。

参考：