感谢 @顾子韵 ,Tass及其他朋友的帮助,缺少他们的帮助无法完成该教程。感兴趣的朋友私聊我或他进群一起学习。
省流中文版本
b站手把手教程,小伙伴们可以直接对着视频进行实践:
1.cd /root 来到root目录
2.apt update && apt upgrade -y && apt install cmake -y更新软件包,并安装cmake
3.export ALL_PROXY=socks5://<hostname>:<port> 设置代理(自行准备代理)
4.wget -e https_proxy=<hostname>:<port> https://github.com/conda-forge/miniforge/releases/download/23.3.1-1/Miniforge3-Linux-aarch64.sh 下载miniforge
5.sudo bash Miniforge3-Linux-aarch64.sh
6.安装miniconda步骤,不停按空格。(具体可以在网上搜索conda安装步骤,步骤差不太多)
7.source ~/.bashrc 激活一下conda的python环境
8.wget -e https_proxy=<hostname>:<port> https://github.com/llvm/llvm-project/releases/download/llvmorg-17.0.2/clang+llvm-17.0.2-aarch64-linux-gnu.tar.xz 下载llvm
9.sudo tar -xvf clang+llvm-17.0.2-aarch64-linux-gnu.tar.xz
10.git clone --recursive https://github.com/mlc-ai/relax.git tvm_unity && cd tvm_unity/
11.mkdir -p build && cd build
12.cp ../cmake/config.cmake .
13.使用vim在config.cmake文件中修改下面几项:
set(CMAKE_BUILD_TYPE RelWithDebInfo) #这一项在文件中没有,需要添加
set(USE_OPENCL ON) #这一项在文件中可以找到,需要修改
set(HIDE_PRIVATE_SYMBOLS ON) #这一项在文件中没有,需要添加
set(USE_LLVM /root/clang+llvm-17.0.2-aarch64-linux-gnu/bin/llvm-config) #这一项在文件中可以找到,需要修改
14.cmake ..
15.make -j8 开始编译tvm
16.cd ../python
17.pip3 install --user .
18.使用vim在/root/.bashrc文件最下面添加环境变量:export PATH="$PATH:/root/.local/bin"
19.source ~/.bashrc 激活环境变量
20.tvmc 测试tvm是否正常安装成功
21.git clone --recursive https://github.com/mlc-ai/mlc-llm.git && cd mlc-llm
22.pip3 install --user .
23.python3 -m mlc_llm.build –help
24.mkdir -p dist/models && cd dist/models
25.git lfs install && git clone https://huggingface.co/THUDM/chatglm2-6b-32k
26.vim chatglm2-6b-32k/config.json
27.添加这一项"vocab_size": 65024
28.cd ../..
接下来需要先按照https://llm.mlc.ai/docs/install/gpu.html#orange-pi-5-rk3588-based-sbc内的步骤安装OpenCL驱动,然后接着下面的步骤。
29.python3 -m mlc_llm.build --model chatglm2-6b-32k --target opencl --max-seq-len 32768 --quantization q8f16_1 该步骤需要在/root/mlc-llm目录下进行
30.curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh 安装rust
31.设置rust环境变量,即在/root/.bashrc文件最下面添加:export PATH="$PATH:/root/.cargo/bin"
32.mkdir -p build && cd build 该命令需要在/root/mlc-llm目录下运行
33.python3 ../cmake/gen_cmake_config.py
34.cmake .. && cmake --build . --parallel $(nproc) && cd ..
35.ls -l ./build/
36../build/mlc_chat_cli –help
37../build/mlc_chat_cli --model chatglm2-6b-32k-q8f16_1 --device opencl 这里的命令在/root/mlc-llm中运行,注意前面的"."是一个!
Prepare
- RK3588 device (
OrangePi 5 Plus 16GB
,Radxa Rock 5B 16GB
,Nanopc T6 16GB
) - LLVM
- TVM
- OpenCL
- MLC-LLM
- Python 3.10 or higher (with pip)
- Models you want to compile
- Skills to access Github and Huggingface
You can follow this instruction to install OpenCL.
TVM
Install minimal pre-requisites.
sudo apt-get update
sudo apt-get install -y python3 python3-dev python3-setuptools gcc libtinfo-dev zlib1g-dev build-essential cmake libedit-dev libxml2-dev
LLVM
wget https://github.com/llvm/llvm-project/releases/download/llvmorg-16.0.4/clang+llvm-16.0.4-aarch64-linux-gnu.tar.xz
tar xvf clang+llvm-16.0.4-aarch64-linux-gnu.tar.xz
TVM
There are 2 TVM. Please do not use the one from:
https://tvm.apache.org/ OR https://github.com/apache/tvm/
Because with this repository, we can’t import tvm.relax in Python. You should download it from mlc-ai/relax.git
Download tvm
from Github.
# clone from GitHub
git clone --recursive https://github.com/mlc-ai/relax.git tvm_unity && cd tvm_unity/
# create build directory
mkdir -p build && cd build
# generate build configuration
cp ../cmake/config.cmake .
Use vim
or whatever you want to edit build/config.cmake
, append these arguments to this file.
set(CMAKE_BUILD_TYPE RelWithDebInfo)
set(USE_OPENCL ON)
set(HIDE_PRIVATE_SYMBOLS ON)
# You must replace <LLVM_PATH> to your llvm location.
set(USE_LLVM <LLVM_PATH>/clang+llvm-16.0.4-aarch64-linux-gnu/bin/llvm-config)
Then compile it. It takes about 20 minutes.
cmake ..
make -j4
After all. Install tvm
Python Package.
cd ../python
pip3 install --user .
If you want to move tvm
to anywhere, you must reinstall this python package.
Verify Installation, you will see help message if you successfully install this package.
tvmc
MLC-LLM
Install Rust enviroment.
sudo apt-get update
sudo apt-get install -y rustc cargo
Return to top folder, and download mlc-llm
from Github, then install the Python package.
# clone mlc-llm from GitHub
git clone --recursive https://github.com/mlc-ai/mlc-llm.git && cd mlc-llm
cd mlc-llm
pip3 install --user .
Verify Installation, you will see help message if you successfully install this package.
python3 -m mlc_llm.build --help
Compile Model
I use ChatGLM2-6B
here for example. In mlc-llm
folder, download the model.
mkdir -p dist/models && cd dist/models
# 11GB space used.
git lfs install
git clone https://huggingface.co/THUDM/chatglm2-6b
Add the vocab_size
arguments to model config.json
vim chatglm2-6b/config.json
{
...,
"vocab_size": 65024
}
Then compile it inmlc-llm
folder.
cd ../..
python3 -m mlc_llm.build --model chatglm2-6b --target opencl --max-seq-len 8192 --quantization q4f16_1
After about 5 minutes, you will see dist/chatglm2-6b-q4f16_1 folder
. In this folder, you will see 3 files:
chatglm2-6b-q4f16_1-opencl.so mod_cache_before_build.pkl params
chatglm2-6b-q4f16_1-opencl.so
is the final product.
Attention
-
You can change
quantization
to different option such as:autogptq_llama_q4f16_0
,autogptq_llama_q4f16_1
,q0f16
,q0f32
,q3f16_0
,q3f16_1
,q4f16_0
,q4f16_1
,q4f16_2
,q4f16_ft
,q4f32_0
,q4f32_1
,q8f16_ft
,q8f16_1
. -
q6f16_1
takes about 5GB memory,q8f16_1
takes about 8GB memory. Make sure your device have enough memory, 16GB memory is necessary in most cases.
Use model
Compile mlc_chat_cli
You should compile mlc_chat_cli
command or mlc_chat
Python package.
Return to mlc-llm
folder.
# create build directory
mkdir -p build && cd build
# generate build configuration
python3 ../cmake/gen_cmake_config.py
# build `mlc_chat_cli`
cmake .. && cmake --build . --parallel $(nproc) && cd ..
Verify validation
# expected to see `mlc_chat_cli`, `libmlc_llm.so` and `libtvm_runtime.so`
ls -l ./build/
# expected to see help message
./build/mlc_chat_cli --help
Use mlc_chat_cli
We use mlc_chat_cli
for this case:
./build/mlc_chat_cli --model <model_name> --device <device_name>
In the previous chapter, we got chatglm2-6b-q4f16_1-opencl.so
, so you can just replace <model_name>
with chatglm2-6b-q4f16_1
, and <device_name>
with opencl:
./build/mlc_chat_cli --model chatglm2-6b-q4f16_1 --device opencl
最终成品
References
https://tvm.apache.org/docs/install/from_source.html#build-the-shared-library
https://llm.mlc.ai/docs/compilation/compile_models.html
https://blog.mlc.ai/2023/08/09/GPU-Accelerated-LLM-on-Orange-Pi
https://zhuanlan.zhihu.com/p/650110025
https://llm.mlc.ai/docs/install
作者:A Chang
文章来源:知乎
推荐阅读
- GiantPandaCVARM Neon Intrinsics 学习指北:从入门、进阶到学个通透
- DPRNN
- DeepSpeech理论与实战
更多芯擎AI开发板干货请关注芯擎AI开发板专栏。欢迎添加极术小姐姐微信(id:aijishu20)加入技术交流群,请备注研究方向。