文章目录
- 前言
- 硬件及软件环境
- 驱动
- 下载驱动
- 安装
- 禁用xserver
- 禁用nouveau
- 安装依赖
- 设置可执行并运行
- 检查安装结果
- Docker配置
- Docker安装
- nvidia-container-runtime安装[^4]
- 命令
- 脚本内容
- 执行脚本
- 安装 nvidia-container-runtime
- 检测
- Docker gpu 验证
- 卸载指令
- 总结
- 异常处理参考链接
前言
博主由于视觉开发需求, 配置nvidia驱动并映射到docker中运行, 在本文中记录过程及遇到的问题
硬件及软件环境
Static hostname: debian
Icon name: computer-desktop
Chassis: desktop
Operating System: Debian GNU/Linux 11 (bullseye)
Kernel: Linux 5.10.0-19-amd64
Architecture: x86-64
CPU: 12th Gen Intel(R) Core(TM) i7-12700F
GPU: Nvidia Quadro M2000
驱动
下载驱动
根据自己的显卡型号去官网搜索对应的驱动程序. 本机选择470.161…03版本驱动.
NVIDIA 驱动程序下载 官方高级驱动搜索
cuda对应驱动版本要求对照表:
NVIDIA CUDA Toolkit Release Notes
注意!
- 直接使用
apt-get install nvidia-driver
时不可运行(can not communicate with nvidia driver 类似报错) - 下载最新驱动525时不可运行(can not communicate with nvidia driver 类似报错)
- 安装时需屏蔽x server及nouveau1
安装
禁用xserver
sudo service gdm3 stop
输入该行指令后会进入命令行状态, 此时只有一个光标, 通过按Ctrl + Alt + F1
和Ctrl + Alt + F2
即可跳出输入用户名密码的指令行.
禁用nouveau
sudo bash -c "echo blacklist nouveau > /etc/modprobe.d/blacklist-nvidia-nouveau.conf"
sudo bash -c "echo options nouveau modeset=0 >> /etc/modprobe.d/blacklist-nvidia-nouveau.conf"
禁用后重启计算机
sudo reboot
安装依赖
后续编译使用的依赖软件2
sudo apt-get install gcc g++ cmake pkg-config
sudo apt-get install linux-headers-$(uname -r|sed 's/[^-]*-[^-]*-//')
设置可执行并运行
chmod +x ~/Downloads/NVIDIA-Linux-x86_64-470.161.03.run
# 需要以管理员权限运行
sudo ~/Downloads/NVIDIA-Linux-x86_64-470.161.03.run
安装完成后, 重启计算机并删除禁用nouveau时创建的blacklist
文件
检查安装结果
nvidia-smi
# 输出
Thu Mar 9 14:22:29 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro M2000 Off | 00000000:01:00.0 On | N/A |
| 63% 59C P0 38W / 75W | 769MiB / 4041MiB | 30% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1835 G /usr/lib/xorg/Xorg 282MiB |
| 0 N/A N/A 1982 G /usr/bin/gnome-shell 110MiB |
| 0 N/A N/A 30799 G gnome-control-center 39MiB |
+-----------------------------------------------------------------------------+
Docker配置
Docker安装
安装可以参考此文如何建立并使用docker
nvidia-container-runtime安装3
命令
nano nvidia-container-runtime-script.sh
脚本内容
curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update
执行脚本
nvidia-container-runtime-script.sh
安装 nvidia-container-runtime
sudo apt-get install nvidia-container-runtime
sudo systemctl restart docker # 重启docker
检测
which nvidia-container-runtime-hook
/usr/bin/nvidia-container-runtime-hook
Docker gpu 验证
docker pull nvidia/cuda:11.3.1-base-ubuntu20.04
docker run --gpus all --rm -it nvidia/cuda:11.3.1-base-ubuntu20.04 bash
nvidia-smi
#输出如下, 说明运行成功:
root@8a57ae3075d7:/# nvidia-smi
Thu Mar 9 06:42:20 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro M2000 Off | 00000000:01:00.0 On | N/A |
| 62% 53C P0 28W / 75W | 761MiB / 4041MiB | 34% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
卸载指令
卸载安装的驱动可以使用4:
sudo apt-get --purge remove "*nvidia*"
sudo /usr/bin/nvidia-uninstall
总结
本文用来记录Debian11在安装nvidia驱动和docker运行时遇到的一些问题, 由于是事后补写可能中间有些异常处理略有缺漏, 各位同学有问题可以留言交流.
异常处理参考链接
显卡驱动报错:NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver.
固件错误Possible missing firmware解决办法
Debian安装英伟达(NVIDIA)驱动一站式避坑教学(Ubuntu通用) ↩︎
Debian 10.2命令安装Nvidia显卡驱动成功,问题回顾 ↩︎
Docker GPU 调用 ↩︎
Ubuntu 卸载 Nvidia 驱动和安装最新驱动 ↩︎