安装CUDA Toolkit
- 安装CUDA Toolkit
- 异常信息
- 分析
- 下载CUDA
- 执行安装
- 配置环境变量
- 验证
安装CUDA Toolkit
异常信息
在执行pip install flash_attn
,安装一个推理加速库的时候,遇到如下异常:
Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Collecting flash_attn
Downloading https://mirrors.aliyun.com/pypi/packages/72/94/06f618bb338ec7203b48ac542e73087362b7750f9c568b13d213a3f181bb/flash_attn-2.5.8.tar.gz (2.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.5/2.5 MB 1.6 MB/s eta 0:00:00
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [20 lines of output]
fatal: not a git repository (or any of the parent directories): .git
/tmp/pip-install-fg7pt8f4/flash-attn_1e4c76d3ba9f4a5d968930613e3c4bd7/setup.py:78: UserWarning: flash_attn was requested, but nvcc was not found. Are you sure your environment has nvcc available? If you're installing within a container from https://hub.docker.com/r/pytorch/pytorch, only images whose names contain 'devel' will provide nvcc.
warnings.warn(
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-install-fg7pt8f4/flash-attn_1e4c76d3ba9f4a5d968930613e3c4bd7/setup.py", line 134, in <module>
CUDAExtension(
File "/usr/local/program/miniconda3/envs/llama3/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1077, in CUDAExtension
library_dirs += library_paths(cuda=True)
File "/usr/local/program/miniconda3/envs/llama3/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1204, in library_paths
if (not os.path.exists(_join_cuda_home(lib_dir)) and
File "/usr/local/program/miniconda3/envs/llama3/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2419, in _join_cuda_home
raise OSError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
torch.__version__ = 2.3.0+cu121
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
分析
首先操作系统已经安装了驱动,并且驱动自带CUDA,可通过nvidia-smi
命令查看
注意:
当时看到这里是有疑惑的,GPU显卡上已经有了CUDA,为何还提示需要CUDA?
原因如下:
首先CUDA有两个主要的API,runtime API和driver API。显然GPU显卡中的CUDA对应driver API,那么此时出现这个异常提示需要CUDA信息,很显然这个CUDA需要的就是runtime API,因此为了支持runtime API,就需要额外再安装CUDA Toolkit
解决异常:
CUDA Toolkit的安装路径通常在
usr/local/
路径下,经检查发现该路径下确实不存在CUDA Toolkit的安装目录
既然没有安装CUDA Toolkit,那么直接安装CUDA Toolkit来尝试解决这个问题。
下载CUDA
CUDA Toolkit是CUDA的工具包,安装CUDA其实就是安装CUDA Toolkit。
访问https://developer.nvidia.com/cuda-toolkit-archive
,选择需要的CUDA版本
为了兼容性,执行nvidia-smi
命令,查看GPU的驱动与CUDA版本
由于GPU自身CUDA版本是12.2,因此这里选择下载CUDA Toolkit 12.2
。
这里选择:Linux系统、x86_64架构、Ubuntu系统、系统版本22.04、runfile(local)安装方式
同时页面下方也给出了安装说明
执行安装
wget https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda_12.2.0_535.54.03_linux.run
sudo sh cuda_12.2.0_535.54.03_linux.run
选择Continue
后回车
输入accept
接受
因为安装了Drive驱动,所以取消安装,默认勾选(x),取消后选择Install进行安装。
出现如下日志,表示安装成功
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-12.2/
Please make sure that
- PATH includes /usr/local/cuda-12.2/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-12.2/lib64, or, add /usr/local/cuda-12.2/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-12.2/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 535.00 is required for CUDA 12.2 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run --silent --driver
Logfile is /var/log/cuda-installer.log
配置环境变量
编辑vim ~/.bashrc
文件,配置环境变量,参考官方文档: Environment Setup
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
export PATH=$PATH:/usr/local/cuda/bin
export CUDA_HOME=$CUDA_HOME:/usr/local/cuda
验证
执行nvcc -V
命令,查看cuda是否安装成功
CUDA NVCC就是CUDA的编译器,可以从CUDA Toolkit的/bin目录中获取,类似于gcc就是c语言的编译器
root@master:~# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Jun_13_19:16:58_PDT_2023
Cuda compilation tools, release 12.2, V12.2.91
Build cuda_12.2.r12.2/compiler.32965470_0