好几次遇到问为什么安装的 tensorflow 不能调用GPU,之前搞定过几次,前两天又有人问,又捣鼓了很久才搞定,这里简单记录一下我遇到的问题,以及解决方案。
一、安装方法
(一)安装并更新 conda
1.安装 conda
安装 conda 很重要,使用 pip 安装 tensorflow-gpu 太多问题了(这里默认已经安装了conda)。
2.更新 conda
conda update -n base -c defaults conda --repodata-fn=repodata.json
之前根据百度,都是执行:
conda update -n base -c defaults conda
然后,首先该命令无法更新到最新的 conda;其次,我们在使用 conda -V 查看版本时,conda 版本显示错误。
该解决方案来自于 GitHub:I got update warning message but unable to update · Issue #12519 · conda/conda · GitHubhttps://github.com/conda/conda/issues/12519
将 conda 的 base 更新到最新,我觉得原因是能够同步最新的包依赖关系,过时的版本可能导致依赖出问题。
(二)创建环境
1.创建环境
conda create -n TensorFlow2.4 python=3.9
当然,这里可以根据自己的 CUDA 版本选择对应的 tensorflow 版本,我的 CUDA 版本为 11.3 :
(Tensorflow2.4) name@eclab:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Mar_21_19:15:46_PDT_2021
Cuda compilation tools, release 11.3, V11.3.58
Build cuda_11.3.r11.3/compiler.29745058_0
不显示的话,可以自行进一步搜索为什么 nvcc -V 不显示:
Ubuntu20.04LTS系统CUDA已经安装但nvcc -V显示command not found_nvcc -v 提示未找到命令_AISecurity盐究员的博客-CSDN博客安装了NVIDIA驱动程序,同时也安装了CUDA,但使用nvcc -V使用nvcc -V命令可以查看CUDA的版本,如下所示为正常的输入、输出内容,可以看出通过nvcc -V命令,可以看到目前所使用的CUDA版本。_nvcc -v 提示未找到命令https://blog.csdn.net/m0_38068876/article/details/127836484 注 .bashrc 文件添加环境变量时,需要根据 /usr/local/ 下的 cuda实际情况进行修改,这里展示我的情况:
(Tensorflow2.4) name@eclab:~$ cd /usr/local/
(Tensorflow2.4) name@eclab:/usr/local$ ls
bin cuda-11.3 games lib sbin src
cuda etc include man share sunlogin
这里有 cuda 软链接,链接到 cuda-11.3,所以建议使用下面命令:
# cuda-11.3
export PATH=/usr/local/cuda-11.3/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.3/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
(三)安装 tensorflow-gpu
1.安装
激活环境:
conda activate TensorFlow2.4
安装 tensorflow-gpu:
conda install tensorflow-gpu
注:不要使用pip安装!不要使用pip安装!不要使用pip安装!
这里没有选择 tensorflow-gpu 版本,conda 自动下载了 tensorflow-gpu==2.4.1 (版本对应可以查看 Build from source | TensorFlow)。
2.测试
执行如下两个命令即可:
(Tensorflow2.4) name@eclab:/usr/local$ python
Python 3.9.17 (main, Jul 5 2023, 20:41:20)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2023-07-10 10:22:16.571135: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
>>> tf.config.list_physical_devices('GPU')
2023-07-10 10:22:27.565493: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2023-07-10 10:22:27.567453: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2023-07-10 10:22:27.611185: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:02:00.0 name: NVIDIA TITAN X (Pascal) computeCapability: 6.1
coreClock: 1.531GHz coreCount: 28 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 447.48GiB/s
2023-07-10 10:22:27.612680: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 1 with properties:
pciBusID: 0000:03:00.0 name: NVIDIA TITAN X (Pascal) computeCapability: 6.1
coreClock: 1.531GHz coreCount: 28 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 447.48GiB/s
2023-07-10 10:22:27.613857: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 2 with properties:
pciBusID: 0000:82:00.0 name: NVIDIA TITAN X (Pascal) computeCapability: 6.1
coreClock: 1.531GHz coreCount: 28 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 447.48GiB/s
2023-07-10 10:22:27.614783: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 3 with properties:
pciBusID: 0000:83:00.0 name: NVIDIA TITAN X (Pascal) computeCapability: 6.1
coreClock: 1.531GHz coreCount: 28 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 447.48GiB/s
2023-07-10 10:22:27.614821: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2023-07-10 10:22:27.617316: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2023-07-10 10:22:27.617370: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2023-07-10 10:22:27.619509: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2023-07-10 10:22:27.619882: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2023-07-10 10:22:27.622449: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2023-07-10 10:22:27.623913: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2023-07-10 10:22:27.629319: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7
2023-07-10 10:22:27.644606: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0, 1, 2, 3
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:3', device_type='GPU')]
(四)使用 pip 安装的问题
下面演示使用 pip 安装的话存在的问题。
(base) name@eclab:~$ conda create -n Tensorflow-err python=3.9
(Tensorflow-err) name@eclab:~$ pip install tensorflow-gpu==2.4.1
ERROR: Could not find a version that satisfies the requirement tensorflow-gpu==2.4.1 (from versions: 2.5.0, 2.5.1, 2.5.2, 2.5.3, 2.6.0, 2.6.1, 2.6.2, 2.6.3, 2.6.4, 2.6.5, 2.7.0rc0, 2.7.0rc1, 2.7.0, 2.7.1, 2.7.2, 2.7.3, 2.7.4, 2.8.0rc0, 2.8.0rc1, 2.8.0, 2.8.1, 2.8.2, 2.8.3, 2.8.4, 2.9.0rc0, 2.9.0rc1, 2.9.0rc2, 2.9.0, 2.9.1, 2.9.2, 2.9.3, 2.10.0rc0, 2.10.0rc1, 2.10.0rc2, 2.10.0rc3, 2.10.0, 2.10.1, 2.11.0rc0, 2.11.0rc1, 2.11.0rc2, 2.11.0, 2.12.0)
ERROR: No matching distribution found for tensorflow-gpu==2.4.1
首先安装不了2.4.1,根据提示,选择安装2.5.0;
pip install tensorflow-gpu==2.5
使用步骤(三)中的 2.测试方法:
(Tensorflow-err) name@eclab:~$ python
Python 3.9.17 (main, Jul 5 2023, 20:41:20)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2023-07-10 10:38:37.238756: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
>>> tf.config.list_physical_devices('GPU')
2023-07-10 10:38:40.413250: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2023-07-10 10:38:40.456066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:02:00.0 name: NVIDIA TITAN X (Pascal) computeCapability: 6.1
coreClock: 1.531GHz coreCount: 28 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 447.48GiB/s
2023-07-10 10:38:40.457549: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
pciBusID: 0000:03:00.0 name: NVIDIA TITAN X (Pascal) computeCapability: 6.1
coreClock: 1.531GHz coreCount: 28 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 447.48GiB/s
2023-07-10 10:38:40.458707: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 2 with properties:
pciBusID: 0000:82:00.0 name: NVIDIA TITAN X (Pascal) computeCapability: 6.1
coreClock: 1.531GHz coreCount: 28 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 447.48GiB/s
2023-07-10 10:38:40.459651: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 3 with properties:
pciBusID: 0000:83:00.0 name: NVIDIA TITAN X (Pascal) computeCapability: 6.1
coreClock: 1.531GHz coreCount: 28 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 447.48GiB/s
2023-07-10 10:38:40.459700: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2023-07-10 10:38:40.464266: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2023-07-10 10:38:40.464333: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2023-07-10 10:38:40.465775: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2023-07-10 10:38:40.466117: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2023-07-10 10:38:40.467045: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2023-07-10 10:38:40.468303: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2023-07-10 10:38:40.468555: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.3/lib64
2023-07-10 10:38:40.468578: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1766] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]
错误内容:
tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.3/lib64