nnunet和nnDetection更新导致默认安装可能会出现无法调用GPU的问题,这里稍微细致的记录下安装nnDetection环境过程。
1.创建虚拟环境:
Please note that nndetection requires Python 3.8+. Please use PyTorch 1.X version for now and not 2.0
这里要求python3.8版本以上,pytorch1.0以上但不到2.0。综合考虑我选择python3.9:
conda create --name xxx python==3.9
2.在虚拟环境中安装pytroch。
Install CUDA (>10.1) and cudnn (make sure to select compatible versions!)
这里要求CUDA(>10.1)并且选择对应的cudnn。由于之前已经安装CUDA我先查看下本机CUDA版本:
(xxxxxxx) [xxxx@xxxxxxxxx ~]$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Tue_Sep_15_19:10:02_PDT_2020
Cuda compilation tools, release 11.1, V11.1.74
Build cuda_11.1.TC455_06.29069683_0
满足要求的。
[Optional] Depending on your GPU you might need to set TORCH_CUDA_ARCH_LIST, check compute capabilit。
可选项,查看GPU的计算能力没有需求。
Install torch (make sure to match the pytorch and CUDA versions!) (requires pytorch >1.10+) and torchvision(make sure to match the versions!).
这里进一步将pytorch版本要求提升到pytoch1.10以上,并且要匹配CUDA版本。去官网查看发现只右有如下这条满足:
则安装这个版本的pytorch:
Collecting torch==1.10.0+cu111
Downloading https://download.pytorch.org/whl/cu111/torch-1.10.0%2Bcu111-cp39-cp39-linux_x86_64.whl (2137.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━ 1.8/2.1 GB 6.4 kB/s eta 12:30:51
ERROR: Wheel 'torch' located at /tmp/pip-unpack-zyhz8qfc/torch-1.10.0+cu111-cp39-cp39-linux_x86_64.whl is invalid.
下载是失败了。但好像这种下载方式自带cu111了,那么我么是否可以下载自带cu102的版本呢。(不考虑cu12主要原因是本机nvidia驱动为NVIDIA-SMI 450.57 Driver Version: 450.57 CUDA Version: 11.0 )下面开始尝试:
则安装这个版本的pytroch:
(xxxxxxx) [xxxx@xxxxxxxxx ~]$ pip install torch==1.12.1+cu102 torchvision==0.13.1+cu102 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu102
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu102
Collecting torch==1.12.1+cu102
Downloading https://download.pytorch.org/whl/cu102/torch-1.12.1%2Bcu102-cp39-cp39-linux_x86_64.whl (776.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 776.4/776.4 MB 1.3 MB/s eta 0:00:00
Collecting torchvision==0.13.1+cu102
Downloading https://download.pytorch.org/whl/cu102/torchvision-0.13.1%2Bcu102-cp39-cp39-linux_x86_64.whl (19.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19.1/19.1 MB 1.5 MB/s eta 0:00:00
Collecting torchaudio==0.12.1
Downloading https://download.pytorch.org/whl/cu102/torchaudio-0.12.1%2Bcu102-cp39-cp39-linux_x86_64.whl (3.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.7/3.7 MB 1.3 MB/s eta 0:00:00
Collecting typing-extensions (from torch==1.12.1+cu102)
Downloading typing_extensions-4.9.0-py3-none-any.whl.metadata (3.0 kB)
Collecting numpy (from torchvision==0.13.1+cu102)
Using cached numpy-1.26.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
Collecting requests (from torchvision==0.13.1+cu102)
Using cached requests-2.31.0-py3-none-any.whl.metadata (4.6 kB)
Collecting pillow!=8.3.*,>=5.3.0 (from torchvision==0.13.1+cu102)
Using cached Pillow-10.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.5 kB)
Collecting charset-normalizer<4,>=2 (from requests->torchvision==0.13.1+cu102)
Using cached charset_normalizer-3.3.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (33 kB)
Collecting idna<4,>=2.5 (from requests->torchvision==0.13.1+cu102)
Using cached idna-3.6-py3-none-any.whl.metadata (9.9 kB)
Collecting urllib3<3,>=1.21.1 (from requests->torchvision==0.13.1+cu102)
Using cached urllib3-2.1.0-py3-none-any.whl.metadata (6.4 kB)
Collecting certifi>=2017.4.17 (from requests->torchvision==0.13.1+cu102)
Using cached certifi-2023.11.17-py3-none-any.whl.metadata (2.2 kB)
Using cached Pillow-10.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.5 MB)
Using cached numpy-1.26.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)
Using cached requests-2.31.0-py3-none-any.whl (62 kB)
Downloading typing_extensions-4.9.0-py3-none-any.whl (32 kB)
Using cached certifi-2023.11.17-py3-none-any.whl (162 kB)
Using cached charset_normalizer-3.3.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (142 kB)
Using cached idna-3.6-py3-none-any.whl (61 kB)
Using cached urllib3-2.1.0-py3-none-any.whl (104 kB)
Installing collected packages: urllib3, typing-extensions, pillow, numpy, idna, charset-normalizer, certifi, torch, requests, torchvision, torchaudio
Successfully installed certifi-2023.11.17 charset-normalizer-3.3.2 idna-3.6 numpy-1.26.2 pillow-10.1.0 requests-2.31.0 torch-1.12.1+cu102 torchaudio-0.12.1+cu102 torchvision-0.13.1+cu102 typing-extensions-4.9.0 urllib3-2.1.0
安装成功pytroch1.12.1.下面尝试这个版本cuda是否可用:
Python 3.9.0 (default, Nov 15 2020, 14:28:56)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
如上所示可用,看来服务器没安装cuda可以在下载torch是可自带编译好的对应cuda版本来替代(不过可能还需要低于驱动显示的CUDA Version: 11.0 版本)
3.安装nndet框架
Clone nnDetection, cd [path_to_repo] and pip install -e .
(lungdoc) [pacs@localhost ~]$ git clone https://github.com/MIC-DKFZ/nnDetection.git
正克隆到 'nnDetection'...
remote: Enumerating objects: 1449, done.
remote: Counting objects: 100% (154/154), done.
remote: Compressing objects: 100% (65/65), done.
remote: Total 1449 (delta 119), reused 89 (delta 89), pack-reused 1295
接收对象中: 100% (1449/1449), 1.29 MiB | 1.72 MiB/s, done.
处理 delta 中: 100% (848/848), done.
(lungdoc) [pacs@localhost ~]$ cd nnDetection
(lungdoc) [pacs@localhost nnDetection]$ pip install -e .
Obtaining file:///home/pacs/nnDetection
Preparing metadata (setup.py) ... done
...........................................
RuntimeError:
The detected CUDA version (11.1) mismatches the version that was used to compile
PyTorch (10.2). Please make sure to use the same CUDA versions.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
其它安装库都没啥问题,但是在编译nms.cu时用到了安装在本地cuda版本。
为了不返工,我这里就在本机安装cuda10.2版本(虚拟环境中cu102应该值局限与可执行调用,要现场编译就不行了)下子cuda10.2和对应cudnn库。(本机上有安装包这里就不下载了)再次执行
还是报错无语了,再次查看其它系统老环境pytorch1.9.1,cuda11.1。
再次卸载pytroch
torch 1.12.1+cu102
torchaudio 0.12.1+cu102
torchvision 0.13.1+cu102
安装
torch 1.12.1
再次编译还是报原来的错误如下图:
根据报错找到【BUG】关于Pytoch中CUDA扩展的本地安装 - 知乎这个攻略尝试下
##就是报了torch api中cloneable.h文件的错误,经过尝试,将cloneable.h文件中46行,58行,70行三句
copy->parameters_.size() == parameters_.size()
copy->buffers_.size() == buffers_.size()
copy->children_.size() == children_.size()
##分别改成
copy->parameters_.size() == this -> parameters_.size()
copy->buffers_.size() == this -> buffers_.size()
copy->children_.size() == this -> children_.size()
##保存后再次安装成功。
大佬牛啊,评论区大佬指出是有gcc版本过低导致的。本机gcc5.4也会报这个错。