win11 下部署Vicuna-7B，Vicuna-13B模型，

news2025/2/24 18:01:18

运行Vicuna-7B需要RAM>30GB或者14GB的显存
运行Vicuna-13B需要RAM>60GB或者28GB的显存

如果没有上面的硬件配置请绕行了，我笔记本有64G内存，两个都跑跑看，使用python3.9，当时转换13b时一直崩溃后来发现是没有设定虚拟内存，后来加上了9个G，才可以跑起来

下载llama原始模型

nyanko7/LLaMA-7B at mainWe’re on a journey to advance and democratize artificial intelligence through open source and open science.https://huggingface.co/nyanko7/LLaMA-7B/tree/mainhuggyllama/llama-13b at mainWe’re on a journey to advance and democratize artificial intelligence through open source and open science.https://huggingface.co/huggyllama/llama-13b/tree/main也可以用迅雷下载下面的链接，注只要7b，13b就可以了

磁力链接：magnet:?xt=urn:btih:b8287ebfa04f879b048d4d4404108cf3e8014352&dn=LLaMA

下载的文件如下：

下载vicuna-7b-delta-v1.1 和vicuna-13b-delta-v1.1

https://huggingface.co/lmsys/vicuna-7b-delta-v1.1/tree/mainWe’re on a journey to advance and democratize artificial intelligence through open source and open science.https://huggingface.co/lmsys/vicuna-7b-delta-v1.1/tree/mainlmsys/vicuna-13b-delta-v1.1 at mainWe’re on a journey to advance and democratize artificial intelligence through open source and open science.https://huggingface.co/lmsys/vicuna-13b-delta-v1.1/tree/main

安装相关软件

pip install fschat
pip install protobuf==3.20.0
git clone https://github.com/huggingface/transformers.git
cd transformers
python setup.py install

转换llaMA模型

python transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py  --input_dir LLaMA/  --model_size 7B  --output_dir ./output/llama-7b

13b

python transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py  --input_dir LLaMA/  --model_size 13B  --output_dir ./output/llama-13b

合并生成Vicuna模型，13b的64g内存罩不住，得要设定虚拟内存16G-64G左右就够了

python -m fastchat.model.apply_delta --base ./output/llama-7b --target ./vicuna-7b --delta ./vicuna-7b-delta-v1.1

python -m fastchat.model.apply_delta --base ./output/llama-13b --target ./vicuna-13b --delta ./vicuna-13b-delta-v1.1

参数介绍：

base	转换llaMA模型后的路径
target	合并生成后的保存路径
delta	下载的vicuna-7b-delta-v1.1路径

运行模型

python -m fastchat.serve.cli --model-path ./vicuna-7b --device cpu

python -m fastchat.serve.cli --model-path ./vicuna-13b --device cpu

7b的占用约26G内存，在64G内存上，i9 12900h运行、响应速度还可以，

13b的占用大约50G内存，在64G内存上，i9 12900h运行缓慢

总结：尽管小点的模型可以运行了，如果想自己进行微调还是要使用gpu ,推荐A100显卡，或者A800. 先期不投硬件先租用可以用矩池云 - 专注于人工智能领域的云服务商矩池云是一家专注于人工智能领域的GPU云服务商。提供稳定的人工智能云服务器、人工智能教学实训环境、高速网盘等服务，支持公有云、私有云、专有云、硬件直采等专业级人工智能解决方案。https://matpool.com/

模型推理(Web UI方式)
如果想要以web UI方式提供服务，则需要配置3个部分。

web servers，用户的交互界面
model workers，托管模型
controller，用以协调web server和model worker
启动控制器

python3 -m fastchat.serve.controller --host 0.0.0.0

启动model worker

python -m fastchat.serve.model_worker  --model-path ./vicuna-7b --model-name vicuna-7b --host 0.0.0.0 --device cpu

当进程完成模型的加载后，会看到「Uvicorn running on …」

python -m fastchat.serve.gradio_web_server --port 8809

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.coloradmin.cn/o/532374.html

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈，一经查实，立即删除！