KTransformers作为一个开源框架,专门为优化大规模语言模型的推理过程而设计。它支持GPU/CPU异构计算,并针对MoE架构的稀疏性进行了特别优化,可以有效降低硬件要求,允许用户在有限的资源下运行像DeepSeek-R1这样庞大的模型。
硬件配置:
CPU: 使用的是Intel Xeon Silver 4310 CPU @ 2.10GHz,拥有24个物理核心(每个插槽12个核心),支持超线程技术,总共有48个逻辑处理器。
内存: 系统配备了1T的DDR4内存,频率为3200MHz。
GPU: NVIDIA GeForce RTX 4090,显存为24GB。
软件环境:
操作系统版本:Ubuntu 20.04
CUDA版本:12.4
软件框架: KTransformers v0.2.1,支持DeepSeek-R1模型的本地推理。
模型参数:DeepSeek-R1-Q4_K_M
KTransformers:
ktransformers:https://github.com/kvcache-ai/ktransformers
ktransformers安装指南:https://kvcache-ai.github.io/ktransformers/en/install.html
模型文件:
huggingface(科学上网):https://huggingface.co/unsloth/DeepSeek-R1-GGUF
modelscope(国内推荐):https://modelscope.cn/models/unsloth/DeepSeek-R1-GGUF
实现步骤:
1.使用 Conda
创建虚拟环境
我们建议使用Conda来创建一个Python=3.11的虚拟环境来运行程序:
bash conda create --name ktransformers python=3.11
conda activate ktransformers
2.模型加载
下载Deepseek原模型配置文件
modelscope:https://modelscope.cn/models/deepseek-ai/DeepSeek-R1
huggingface:https://huggingface.co/deepseek-ai/DeepSeek-R1
(模型都是从modelscope下载的,但是KTransformer如果没有找到本地的模型文件,会从Hugging Face上面去搜索相应的模型进行下载。而由于国内的一些网络问题,可能会导致直接下载模型失败。有一种方法是配置HF的镜像地址:$ export HF_ENDPOINT=https://hf-mirror.com)
检查配置文件是否完整
3.下载源代码并编译
初始化源代码:
bash git clone https://github.com/kvcache-ai/ktransformers.git
cd ktransformers
git submodule init
git submodule update
安装(Linux): bash bash install.sh
pip install flash-attn
以及需要手动安装libstdc
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get install --only-upgrade libstdc++6
conda install -c conda-forge libstdcxx-ng
$ git clone https://github.com/kvcache-ai/ktransformers.git
正克隆到 ‘ktransformers’…
remote: Enumerating objects: 1866, done.
remote: Counting objects: 100% (655/655), done.
remote: Compressing objects: 100% (300/300), done.
remote: Total 1866 (delta 440), reused 359 (delta 355), pack-reused 1211 (from 2)
接收对象中: 100% (1866/1866), 9.33 MiB | 7.65 MiB/s, 完成.
处理 delta 中: 100% (990/990), 完成.
$ cd ktransformers/
$ git submodule init
子模组 ‘third_party/llama.cpp’(https://github.com/ggerganov/llama.cpp.git)已对路径 ‘third_party/llama.cpp’ 注册
子模组 ‘third_party/pybind11’(https://github.com/pybind/pybind11.git)已对路径 ‘third_party/pybind11’ 注册
$ git submodule update
正克隆到 ‘/datb/DeepSeek/ktransformers/third_party/llama.cpp’…
正克隆到 ‘/datb/DeepSeek/ktransformers/third_party/pybind11’…
子模组路径 ‘third_party/llama.cpp’:检出 ‘a94e6ff8774b7c9f950d9545baf0ce35e8d1ed2f’
子模组路径 ‘third_party/pybind11’:检出 ‘bb05e0810b87e74709d9f4c4545f1f57a1b386f5’
$ bash install.sh
Successfully built ktransformers
Installing collected packages: wcwidth, zstandard, tomli, tenacity, sniffio, six, pyproject_hooks, pydantic-core, psutil, propcache, orjson, ninja, multidict, jsonpointer, h11, greenlet, frozenlist, exceptiongroup, colorlog, click, attrs, async-timeout, annotated-types, aiohappyeyeballs, yarl, uvicorn, SQLAlchemy, requests-toolbelt, pydantic, jsonpatch, httpcore, build, blessed, anyio, aiosignal, starlette, httpx, aiohttp, langsmith, fastapi, accelerate, langchain-core, langchain-text-splitters, langchain, ktransformers
Successfully installed SQLAlchemy-2.0.38 accelerate-1.3.0 aiohappyeyeballs-2.4.6 aiohttp-3.11.12 aiosignal-1.3.2 annotated-types-0.7.0 anyio-4.8.0 async-timeout-4.0.3 attrs-25.1.0 blessed-1.20.0 build-1.2.2.post1 click-8.1.8 colorlog-6.9.0 exceptiongroup-1.2.2 fastapi-0.115.8 frozenlist-1.5.0 greenlet-3.1.1 h11-0.14.0 httpcore-1.0.7 httpx-0.28.1 jsonpatch-1.33 jsonpointer-3.0.0 ktransformers-0.2.1+cu128torch26fancy langchain-0.3.18 langchain-core-0.3.35 langchain-text-splitters-0.3.6 langsmith-0.3.8 multidict-6.1.0 ninja-1.11.1.3 orjson-3.10.15 propcache-0.2.1 psutil-7.0.0 pydantic-2.10.6 pydantic-core-2.27.2 pyproject_hooks-1.2.0 requests-toolbelt-1.0.0 six-1.17.0 sniffio-1.3.1 starlette-0.45.3 tenacity-9.0.0 tomli-2.2.1 uvicorn-0.34.0 wcwidth-0.2.13 yarl-1.18.3 zstandard-0.23.0
Installation completed successfully
参考:
https://www.cnblogs.com/dechinphy/p/18719866/ktransformer
https://zhuanlan.zhihu.com/p/25811017239