Chinese-LLaMA-Alpaca代码实战

news2025/7/16 11:30:04

文章目录

微调chinese-alpaca
部署llama.cpp
将FP16模型量化为4-bit

项目地址： https://github.com/ymcui/Chinese-LLaMA-Alpaca

微调chinese-alpaca

本项目基于中文数据

开源了使用中文文本数据预训练的中文LLaMA大模型（7B、13B）
开源了进一步经过指令精调的中文Alpaca大模型（7B、13B）

使用text-generation-webui搭建界面
接下来以 text-generation-webui 工具为例，介绍无需合并模型即可进行本地化部署的详细步骤。
1、先新建一个conda环境。

conda create -n textgen python=3.10
conda activate textgen
pip install torch torchvision torchaudio

/2、下载chinese-alpaca-lora-7b权重：https://drive.google.com/file/d/1JvFhBpekYiueWiUL3AF1TtaWDb3clY5D/view?usp=sharing

# 克隆text-generation-webui
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
pip install -r requirements.txt

# 将下载后的lora权重放到loras文件夹下
ls loras/chinese-alpaca-lora-7b
adapter_config.json  adapter_model.bin  special_tokens_map.json  tokenizer_config.json  tokenizer.model

三种方式下载

通过transformers-cli下载HuggingFace格式的llama-7B模型文件

transformers-cli download decapoda-research/llama-7b-hf --cache-dir ./llama-7b-hf

通过snapshot_download下载：

pip install huggingface_hub
python
from huggingface_hub import snapshot_download
snapshot_download(repo_id="decapoda-research/llama-7b-hf", cache_dir="./llama-7b-hf")

通过git命令进行下载（需要提前安装git-lfs）

git clone https://huggingface.co/decapoda-research/llama-7b-hf

我这里用的第二种。

# 将HuggingFace格式的llama-7B模型文件放到models文件夹下
ls models/llama-7b-hf
pytorch_model-00001-of-00002.bin pytorch_model-00002-of-00002.bin config.json pytorch_model.bin.index.json generation_config.json
# 复制lora权重的tokenizer到models/llama-7b-hf下
cp loras/chinese-alpaca-lora-7b/tokenizer.model ~/text-generation-webui/models/llama-7b-hf/models--decapoda-research--llama-7b-hf/snapshots/5f98eefcc80e437ef68d457ad7bf167c2c6a1348/

cp loras/chinese-alpaca-lora-7b/special_tokens_map.json ~/text-generation-webui/models/llama-7b-hf/models--decapoda-research--llama-7b-hf/snapshots/5f98eefcc80e437ef68d457ad7bf167c2c6a1348/

cp loras/chinese-alpaca-lora-7b/tokenizer_config.json ~/text-generation-webui/models/llama-7b-hf/models--decapoda-research--llama-7b-hf/snapshots/5f98eefcc80e437ef68d457ad7bf167c2c6a1348/

# 修改/modules/LoRA.py文件，大约在第28行
shared.model.resize_token_embeddings(len(shared.tokenizer))
shared.model = PeftModel.from_pretrained(shared.model, Path(f"{shared.args.lora_dir}/{lora_names[0]}"), **params)

# 接下来就可以愉快的运行了，参考https://github.com/oobabooga/text-generation-webui/wiki/Using-LoRAs
# python server.py --model llama-7b-hf --lora chinese-alpaca-lora-7b
# 使用int8
python server.py --model llama-7b-hf/models--decapoda-research--llama-7b-hf/snapshots/5f98eefcc80e437ef68d457ad7bf167c2c6a1348/ --lora chinese-alpaca-lora-7b --load-in-8bit

报错
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.model.embed_tokens.weight: copying a param with shape torch.Size([49954, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
size mismatch for base_model.model.lm_head.weight: copying a param with shape torch.Size([49954, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).

解决（用下面代码进行替换）：
shared.model.resize_token_embeddings(49954)
assert shared.model.get_input_embeddings().weight.size(0) == 49954
shared.model = PeftModel.from_pretrained(shared.model, Path(f"{shared.args.lora_dir}/{lora_names[0]}"), **params)

设置下对外开放
To create a public link, set share=True in launch().

实验效果：生成的中文较短

示例：

below is an instruction rthat destribes a task.
write a response that appropriately conpletes the request.
### Instruction:
我得了流感，请帮我写一封请假条
### Response:

在这里插入图片描述

部署llama.cpp

下载合并后的模型权重：

Colab notebook：https://colab.research.google.com/drive/1Eak6azD3MLeb-YsfbP8UZC8wrL1ddIMI?usp=sharing
或者notebook/文件夹下的ipynb文件：https://github.com/ymcui/Chinese-LLaMA-Alpaca/blob/main/notebooks/convert_and_quantize_chinese_llama.ipynb

将合并后的模型权重下载到本地，然后传到服务器上。

# 下载项目
git clone https://github.com/ggerganov/llama.cpp
# 编译
cd llama.cpp && make
# 建一个文件夹
cd llama.cpp && mkdir zh-models && mkdir 7B

将alpaca-combined下的文件都放到7B目录下后，执行下面的操作

mv llama.cpp/zh-models/7B/tokenizer.model llama.cpp/zh-models/
ls llama.cpp/zh-models/

会显示：7B tokenizer.model

执行转换过程

python convert.py zh-models/7B/

会生成ggml-model-f16.bin

将FP16模型量化为4-bit

我们进一步将FP16模型转换为4-bit量化模型。

./quantize ./zh-models/7B/ggml-model-f16.bin ./zh-models/7B/ggml-model-q4_0.bin 2

可按需使用

./main -m ./zh-models/7B/ggml-model-f16.bin --color -f ./prompts/alpaca.txt -p "详细介绍一下北京的名胜古迹：" -n 512

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.coloradmin.cn/o/568984.html

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈，一经查实，立即删除！

Chinese-LLaMA-Alpaca代码实战

文章目录

微调chinese-alpaca

部署llama.cpp

将FP16模型量化为4-bit

相关文章

JavaFX【TableView使用详解】

软件测试完后，运行后还有BUG，测试人员就应该背锅吗？

软件测试人的第一个实战项目：web端（视频教程+文档+用例库）

STM32入门100步（第4步～第5步）

细说前端打包发布后，浏览器缓存如何清理？其实只需要简单的webpack配置就行

c#——WCF和HTTP文件传输实验

CentOS7编译安装Python3.10（含OpenSSL1.1.1安装），创建虚拟环境，运行Django项目（含sqlite版本报错）

Java高并发核心编程—JUC显示锁原理

Apache Kafka - ConsumerInterceptor 实战 (1)

【Python开发】FastAPI 01：hello world

从索引结点出发探索软、硬链接

Cos上传(腾讯云):图片存储方案

总结SpringBoot常用读取配置文件的3种方法

Hotbit交易平台停运，百万用户待清退，币圈危机再度蔓延

微前端乾坤

基于上下文折扣的多模态医学图像分割证据融合

利用PaddleOCR识别增值税发票平台验证码（开箱即用）

BI技巧丨度量值的动态格式字符串

Java调用第三方库JNA(C/C++)

传染病学模型 | Matlab实现SEIRS传染病学模型 (SEIRS Epidemic Model)