使用LoRA(Low-Rank Adaptation of Large Language Models)技术对Llama-3语言模型进行微调。
理论知识参考百度安全验证
微调的前提条件
现在huggingface上下载llama2或llama3的huggingface版本。
我下载的是llama-2-13b-chat。
大语言模型微调方法
全参数微调(Full parameter fine-tuning)是一种对预训练模型所有层的所有参数进行微调的方法。一般来说,它可以实现最佳性能,但也是最耗资源和耗时的:它需要最多的 GPU 资源,并且耗时最长。
PEFT(Parameter-Efficient Fine-Tuning),允许以最少的资源和成本微调模型。有两种重要的 PEFT 方法:LoRA(Low Rank Adaptation)和 QLoRA(Quantized LoRA),其中预训练模型分别作为量化的 8 位和 4 位权重加载到 GPU。您很可能可以使用 LoRA 或 QLoRA 微调在具有 24GB 内存的单个消费级 GPU 上微调 Llama 2-13B 模型,并且使用 QLoRA 所需的 GPU 内存和微调时间甚至比 LoRA 更少。
通常,应该先尝试 LoRA,如果资源极其有限,则应先尝试 QLoRA,然后在完成微调后评估性能。只有当性能不理想时才考虑进行全面微调。
使用LoRA对Llama2微调
1. 导入库
from transformers import LlamaTokenizer, LlamaForCausalLM, Trainer, TrainingArguments
from peft import get_peft_model, LoraConfig, TaskType
from datasets import load_from_disk, Dataset
解释:导入所需的库:
transformers
用于处理语言模型和分词器。peft
用于实现LoRA微调。datasets
用于处理数据集。
2. 构建数据集
此处的数据集你可以根据你领域的数据构建。
data = {
"instruction": [
"What is the capital of France?",
"What is 2 + 2?",
"How do you greet someone in English?"
],
"response": [
"The capital of France is Paris.",
"2 + 2 equals 4.",
"You greet someone by saying 'Hello' in English."
]
}
dataset = Dataset.from_dict(data)
dataset.save_to_disk("simple_dataset")
解释:创建了一个简单的数据集,包括instruction
和response
字段。然后将其转换为Dataset
对象,并保存到磁盘中。
3. 加载模型和分词器
model_name = r"C:\apps\ml_model\llama-2-13b-chat-hf"
print("starting to load tokenizer.")
tokenizer = LlamaTokenizer.from_pretrained(model_name)
print("starting to load model.")
model = LlamaForCausalLM.from_pretrained(model_name)
print("Finished to load model")
解释:指定模型路径,加载Llama-2的分词器和模型,打印加载过程的日志信息。
4. 定义LoRA配置
lora_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
r=16,
lora_alpha=32,
lora_dropout=0.1,
)
解释:配置LoRA相关参数,如低秩分解的维度r
,缩放因子lora_alpha
,以及Dropout比例lora_dropout
。
5. 包装模型
print("Starting to get peft model")
model = get_peft_model(model, lora_config)
tokenizer.pad_token = tokenizer.eos_token
解释:将预训练模型通过LoRA技术进行包装,并设置分词器的填充标记。
6. 加载数据集
dataset = load_from_disk("simple_dataset")
解释:从磁盘加载之前保存的数据集。
7. 数据集分词
def tokenize_function(examples):
combined_texts = [
instruction + " " + response
for instruction, response in zip(examples["instruction"], examples["response"])
]
tokenized_inputs = tokenizer(
combined_texts,
truncation=True,
padding="max_length",
max_length=128
)
tokenized_inputs["labels"] = tokenized_inputs["input_ids"].copy()
return tokenized_inputs
tokenized_dataset = dataset.map(tokenize_function, batched=True)
解释:定义了一个函数,将指令和响应文本连接起来,并对其进行分词。结果包括输入的input_ids
,并将其复制为标签。
8. 定义训练参数
training_args = TrainingArguments(
output_dir="./llama2-lora-finetuned",
per_device_train_batch_size=1,
num_train_epochs=3,
logging_dir="./logs",
logging_steps=10,
save_steps=500,
save_total_limit=2,
)
解释:配置训练参数,如输出目录、每设备的训练批量大小、训练轮数、日志记录频率、保存频率等。
9. 初始化Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_dataset,
)
解释:使用Hugging Face的Trainer
类,传入微调后的模型、训练参数和数据集,准备开始训练。
10. 模型训练和保存
print("Starting to training")
trainer.train()
print("Finished to training")
trainer.save_model("llama2-lora-finetuned")
print("model saved")
解释:开始微调模型,训练完成后将微调后的模型保存到指定目录中,并打印相关日志信息。
执行结果如下
Saving the dataset (1/1 shards): 100%|██████████| 3/3 [00:00<00:00, 428.59 examples/s]
starting to load tokenizer.
starting to load model.
Loading checkpoint shards: 100%|██████████| 3/3 [03:10<00:00, 63.57s/it]
Finished to load model
Starting to get peft model
get peft model
Map: 100%|██████████| 3/3 [00:00<00:00, 37.04 examples/s]
C:\Users\Harry\anaconda3\envs\ai_service\lib\site-packages\accelerate\accelerator.py:451: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches']). Please pass an `accelerate.DataLoaderConfiguration` instead:
dataloader_config = DataLoaderConfiguration(dispatch_batches=None)
warnings.warn(
C:\Users\Harry\anaconda3\envs\ai_service\lib\site-packages\pydantic\_internal\_fields.py:151: UserWarning: Field "model_server_url" has conflict with protected namespace "model_".
You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
warnings.warn(
C:\Users\Harry\anaconda3\envs\ai_service\lib\site-packages\pydantic\_internal\_config.py:322: UserWarning: Valid config keys have changed in V2:
* 'schema_extra' has been renamed to 'json_schema_extra'
warnings.warn(message, UserWarning)
Starting to training
100%|██████████| 9/9 [47:12<00:00, 314.57s/it]{'train_runtime': 2832.8354, 'train_samples_per_second': 0.003, 'train_steps_per_second': 0.003, 'train_loss': 5.6725747850206165, 'epoch': 3.0}
100%|██████████| 9/9 [47:12<00:00, 314.72s/it]
Finished to training
model saved
Process finished with exit code 0
在llama2-lora-finetuned目录下你会发现lora模型
全部代码
from transformers import LlamaTokenizer, LlamaForCausalLM, Trainer, TrainingArguments
from peft import get_peft_model, LoraConfig, TaskType
from datasets import load_from_disk
from datasets import Dataset
data = {
"instruction": [
"What is the capital of France?",
"What is 2 + 2?",
"How do you greet someone in English?"
],
"response": [
"The capital of France is Paris.",
"2 + 2 equals 4.",
"You greet someone by saying 'Hello' in English."
]
}
dataset = Dataset.from_dict(data)
dataset.save_to_disk("simple_dataset")
# Load the model and tokenizer
model_name = r"C:\apps\ml_model\llama-2-13b-chat-hf"
print("starting to load tokenizer.")
tokenizer = LlamaTokenizer.from_pretrained(model_name)
print("starting to load model.")
# Load model with 8-bit precision
model = LlamaForCausalLM.from_pretrained(model_name)
print("Finished to load model")
# Define LoRA configuration
lora_config = LoraConfig(
task_type=TaskType.CAUSAL_LM, # This is for language modeling
r=16, # LoRA attention dimension
lora_alpha=32, # LoRA scaling factor
lora_dropout=0.1, # Dropout for LoRA layers
)
print("Starting to get peft model")
# Wrap the model with LoRA
model = get_peft_model(model, lora_config)
tokenizer.pad_token = tokenizer.eos_token
# if tokenizer.pad_token is None:
# tokenizer.add_special_tokens({'pad_token':'PAD'})
# model.resize_token_embeddings(len(tokenizer))
print("get peft model")
# Load the dataset
dataset = load_from_disk("simple_dataset")
# Tokenize the dataset
def tokenize_function(examples):
combined_texts = [
instruction + " " + response
for instruction, response in zip(examples["instruction"], examples["response"])
]
tokenized_inputs = tokenizer(
combined_texts,
truncation=True,
padding="max_length", # Ensure consistent padding
max_length=128 # Adjust max_length as needed
)
tokenized_inputs["labels"] = tokenized_inputs["input_ids"].copy()
return tokenized_inputs
tokenized_dataset = dataset.map(tokenize_function, batched=True)
# Define training arguments
training_args = TrainingArguments(
output_dir="./llama2-lora-finetuned",
per_device_train_batch_size=1,
num_train_epochs=3,
logging_dir="./logs",
logging_steps=10,
save_steps=500,
save_total_limit=2,
# fp16=True, # Enable mixed precision
)
# Initialize the Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_dataset,
)
print("Starting to training")
# Fine-tune the model with LoRA
trainer.train()
print("Finished to training")
# Save the fine-tuned model
trainer.save_model("llama2-lora-finetuned")
print("model saved ")
合并LoRA微调权重到基础模型中,生成一个完整的模型
导入必要的库:
from transformers import LlamaForCausalLM, LlamaTokenizer
from peft import PeftModel, PeftConfig
解释:导入transformers
库中的Llama模型和分词器,以及peft
库中的用于处理LoRA微调模型的PeftModel
类。
加载基础Llama 2模型:
model_name = r"C:\apps\ml_model\llama-2-13b-chat-hf"
base_model = LlamaForCausalLM.from_pretrained(model_name)
tokenizer = LlamaTokenizer.from_pretrained(model_name)
base_model.resize_token_embeddings(len(tokenizer))
解释:
model_name
是基础Llama-2模型的路径。LlamaForCausalLM.from_pretrained(model_name)
用于加载预训练的基础模型。LlamaTokenizer.from_pretrained(model_name)
加载对应的分词器。base_model.resize_token_embeddings(len(tokenizer))
调整模型的词嵌入大小,以适应分词器的词汇量。
加载PEFT微调模型:
peft_model_path = "./llama2-lora-finetuned" # Path to your fine-tuned model
peft_model = PeftModel.from_pretrained(base_model, peft_model_path)
解释:加载经过LoRA微调的模型,路径是peft_model_path
,并基于已经加载的基础模型base_model
。
合并LoRA权重并卸载LoRA配置:
peft_model = peft_model.merge_and_unload()
解释:合并LoRA权重到基础模型中,这一步将LoRA微调权重应用到基础模型的权重上,并且卸载LoRA配置,使得模型变成一个完整的标准模型。
保存合并后的模型:
peft_model.save_pretrained("./llama2-finetuned-combined")
tokenizer.save_pretrained("./llama2-finetuned-combined")
解释:将合并后的模型和分词器保存到指定的路径"./llama2-finetuned-combined"
。
重新加载合并后的模型和分词器:
from transformers import LlamaForCausalLM, LlamaTokenizer
model = LlamaForCausalLM.from_pretrained("./llama2-finetuned-combined")
tokenizer = LlamaTokenizer.from_pretrained("./llama2-finetuned-combined")
解释:从保存的路径中加载已经合并权重的Llama-2模型和对应的分词器。
使用模型进行推理:
inputs = tokenizer("What is the capital of France?", return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
解释:
- 使用分词器将问题
"What is the capital of France?"
转换为模型可以处理的张量格式。 - 调用模型的
generate()
方法生成模型的输出。 - 使用分词器将模型输出的token序列解码为可读文本,并打印输出。
完整代码
from transformers import LlamaForCausalLM, LlamaTokenizer
from peft import PeftModel, PeftConfig
# Load the base LLaMA 2 model
model_name = r"C:\apps\ml_model\llama-2-13b-chat-hf"
base_model = LlamaForCausalLM.from_pretrained(model_name)
tokenizer = LlamaTokenizer.from_pretrained(model_name)
base_model.resize_token_embeddings(len(tokenizer))
# Load the PEFT fine-tuned model
peft_model_path = "./llama2-lora-finetuned" # Path to your fine-tuned model
peft_model = PeftModel.from_pretrained(base_model, peft_model_path)
# Merge the LoRA weights into the base model
peft_model = peft_model.merge_and_unload()
peft_model.save_pretrained("./llama2-finetuned-combined")
tokenizer.save_pretrained("./llama2-finetuned-combined")
from transformers import LlamaForCausalLM, LlamaTokenizer
model = LlamaForCausalLM.from_pretrained("./llama2-finetuned-combined")
tokenizer = LlamaTokenizer.from_pretrained("./llama2-finetuned-combined")
# Now you can use the model for inference
inputs = tokenizer("What is the capital of France?", return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
执行后你会发现下面的模型代码生成
huggingface模型转换为gguf并量化请参考下面博客
Huggingface 模型转换成gguf并且量化_sft训练模型转gguf-CSDN博客
其它
把huggingface的模型转换成GGUF格式
python convert.py C:\Users\Harry\PycharmProjects\llm-finetuning\llama2-finetuned-combined --outfile C:\Users\Harry\PycharmProjects\llm-finetuning\llama2-finetuned-combined\llama2-7b-chat_f16.gguf --outtype f16
量化gguf格式模型
运行命令测试模型
main -m C:\Users\Harry\PycharmProjects\llm-finetuning\llama2-finetuned-combined\llama2-7b-chat_f16-q4_0.gguf --color --ctx_size 2048 -n -1 -ins -b 256 --top_k 10000 --temp 0.2 --repeat_penalty 1.1 -t 8
GitHub - huggingface/peft: 🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.