这里写目录标题
- 驱动安装
- 1. 更新系统
- 2. NVIDIA GPU安装
- 检查系统是否安装了 NVIDIA GPU
- 2.1 首先,使用以下命令更新 DNF 软件包存储库缓存:
- 2.2 安装编译 NVIDIA 内核模块所需的依赖项和构建工具
- 2.3 在 CentOS Stream 9 上添加官方 NVIDIA CUDA 软件包存储库
- 2.4 在 CentOS Stream 9 上安装最新的 NVIDIA GPU 驱动程序
- 2.5 为了使更改生效,请使用以下命令重新启动计算机:
- 2.6 测试
- 2、cuda-toolkit安装
- 2.1 安装
- 2.2 环境配置
- 测试
驱动安装
参考:centos-stream-9-上安装-nvidia-驱动程序
1. 更新系统
首先,确保你的系统是最新的:
sudo dnf update -y
2. NVIDIA GPU安装
检查系统是否安装了 NVIDIA GPU
您可以使用以下命令检查您的计算机是否安装了 NVIDIA GPU:
lspci | egrep 'VGA|3D'
如您所见,我的计算机上安装了 NVIDIA GeForce RTX 3060 GPU。您可能安装了不同的 NVIDIA GPU。
[root@cheng ~]# lspci | egrep 'VGA|3D'
06:00.0 VGA compatible controller: NVIDIA Corporation GA106 [GeForce RTX 3060 Lite Hash Rate] (rev a1)
默认情况下,CentOS Stream 9 上使用开源 Nouveau GPU 驱动程序1,而不是专有 NVIDIA GPU 驱动程序2。安装专有 NVIDIA GPU 驱动程序后,您将看到它们被使用,而不是开源 Nouveau GPU 驱动程序。
lsmod | grep nouveau
lsmod | grep nvidia
[root@cheng ~]# lsmod | grep nouveau
lsmod | grep nvidia
nvidia_drm 143360 0
nvidia_modeset 1421312 1 nvidia_drm
nvidia_uvm 3899392 0
nvidia 70721536 2 nvidia_uvm,nvidia_modeset
video 77824 1 nvidia_modeset
drm_kms_helper 274432 2 nvidia_drm
drm 782336 4 drm_kms_helper,nvidia,nvidia_drm
[root@cheng ~]# lsmod | grep nvidia
lsmod | grep nouveau
nvidia_drm 143360 0
nvidia_modeset 1421312 1 nvidia_drm
nvidia_uvm 3899392 0
nvidia 70721536 2 nvidia_uvm,nvidia_modeset
video 77824 1 nvidia_modeset
drm_kms_helper 274432 2 nvidia_drm
drm 782336 4 drm_kms_helper,nvidia,nvidia_drm
从 BIOS 禁用安全启动
要使 NVIDIA GPU 驱动程序在 CentOS Stream 9 上运行,如果主板使用 UEFI 固件启动操作系统,则必须从主板的 BIOS 禁用安全启动。
在 CentOS Stream 9 上启用 EPEL 存储库
要在 CentOS Stream 9 上安装 NVIDIA GPU 驱动程序,您必须安装所需的构建工具和编译 NVIDIA 内核模块所需的依赖库。其中一些可以在 CentOS Stream 9 EPEL 存储库中找到。
在本节中,我将向您展示如何在 CentOS Stream 9 上启用 EPEL 存储库。
2.1 首先,使用以下命令更新 DNF 软件包存储库缓存:
sudo dnf makecache
使用以下命令启用官方 CentOS Stream 9 CRB 软件包存储库:
sudo dnf config-manager --set-enabled crb
使用以下命令安装 epel-release 和 epel-next-release 软件包:
sudo dnf install epel-release epel-next-release
要确认安装,请按Y,然后按。
要确认 GPG 密钥,请按 Y,然后按 。
应安装 epel-release 和 epel-next-release 软件包,并启用 EPEL 存储库。
为了使更改生效,请使用以下命令更新 DNF 软件包存储库缓存:
sudo dnf makecache
2.2 安装编译 NVIDIA 内核模块所需的依赖项和构建工具
要安装编译 NVIDIA 内核模块所需的构建工具和依赖库,请运行以下命令:
sudo dnf install kernel-headers-$(uname -r) kernel-devel-$(uname -r) tar bzip2 make automake gcc gcc-c++ pciutils elfutils-libelf-devel libglvnd-opengl libglvnd-glx libglvnd-devel acpid pkgconfig dkms
要确认安装,请按Y,然后按。
正在从互联网下载所需的软件包。需要一段时间才能完成。
下载软件包后,系统会要求您确认 CentOS 官方软件包存储库的 GPG 密钥。
要确认 GPG 密钥,请按 Y,然后按 。
要确认 EPEL 存储库的 GPG 密钥,请按 Y,然后按 。
安装应该继续。
至此,编译NVIDIA内核模块所需的依赖库和构建工具就应该安装完毕了。
2.3 在 CentOS Stream 9 上添加官方 NVIDIA CUDA 软件包存储库
要在 CentOS Stream 9 上添加官方 NVIDIA CUDA 软件包存储库,请运行以下命令:
sudo dnf config-manager --add-repo http://developer.download.nvidia.com/compute/cuda/repos/rhel9/$(uname -i)/cuda-rhel9.repo
为了使更改生效,请使用以下命令更新 DNF 软件包存储库缓存:
sudo dnf makecache
2.4 在 CentOS Stream 9 上安装最新的 NVIDIA GPU 驱动程序
要在 CentOS Stream 9 上安装最新版本的 NVIDIA GPU 驱动程序,请运行以下命令:
sudo dnf module install nvidia-driver:latest-dkms
要确认安装,请按Y,然后按。
所有NVIDIA GPU驱动程序包和所需的依赖包都是从互联网上下载的。需要一段时间才能完成。
下载软件包后,系统会要求您确认官方 NVIDIA 软件包存储库的 GPG 密钥。按 Y,然后按 确认 GPG 密钥。
安装应该继续。需要一段时间才能完成。
我在这步执行中报错:
Last metadata expiration check: 0:05:51 ago on Fri 11 Apr 2025 03:30:46 PM CST.
Error:
Problem 1: package nvidia-driver-libs-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires egl-wayland(x86-64) >= 1.1.13.1-3, but none of the providers can be installed
- cannot install the best candidate for the job
- package egl-wayland-1.1.13.1-3.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
- package egl-wayland-1.1.19~20250313gitf1fd514-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
Problem 2: package nvidia-driver-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires nvidia-driver-libs(x86-64) = 3:570.124.06, but none of the providers can be installed
- package nvidia-driver-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires libnvidia-glvkspirv.so.570.124.06()(64bit), but none of the providers can be installed
- package nvidia-driver-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires libnvidia-gpucomp.so.570.124.06()(64bit), but none of the providers can be installed
- package nvidia-driver-libs-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires egl-wayland(x86-64) >= 1.1.13.1-3, but none of the providers can be installed
- cannot install the best candidate for the job
- package egl-wayland-1.1.13.1-3.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
- package egl-wayland-1.1.19~20250313gitf1fd514-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
Problem 3: package xorg-x11-nvidia-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires libnvidia-glcore.so.570.124.06()(64bit), but none of the providers can be installed
- package xorg-x11-nvidia-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires libnvidia-tls.so.570.124.06()(64bit), but none of the providers can be installed
- package nvidia-xconfig-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires xorg-x11-nvidia(x86-64) >= 3:570.124.06, but none of the providers can be installed
- package nvidia-driver-libs-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires egl-wayland(x86-64) >= 1.1.13.1-3, but none of the providers can be installed
- cannot install the best candidate for the job
- package egl-wayland-1.1.13.1-3.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
- package egl-wayland-1.1.19~20250313gitf1fd514-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
Problem 4: package nvidia-driver-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires nvidia-driver-libs(x86-64) = 3:570.124.06, but none of the providers can be installed
- package nvidia-driver-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires libnvidia-glvkspirv.so.570.124.06()(64bit), but none of the providers can be installed
- package nvidia-driver-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires libnvidia-gpucomp.so.570.124.06()(64bit), but none of the providers can be installed
- package nvidia-settings-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires nvidia-driver(x86-64) = 3:570.124.06, but none of the providers can be installed
- package nvidia-driver-libs-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires egl-wayland(x86-64) >= 1.1.13.1-3, but none of the providers can be installed
- cannot install the best candidate for the job
- package egl-wayland-1.1.13.1-3.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
- package egl-wayland-1.1.19~20250313gitf1fd514-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
(try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)
问了大模型的解决办法都不行,最后发现错误日志最后括号内的提示,改成如下命令就成功了:
sudo dnf module install nvidia-driver:latest-dkms --skip-broken
2.5 为了使更改生效,请使用以下命令重新启动计算机:
sudo reboot
检查 NVIDIA 驱动程序是否安装正确
计算机启动后,您应该会看到使用专有的 NVIDIA GPU 驱动程序1,而不是开源的 Nouveau GPU 驱动程序2。
lsmod | grep nvidia
lsmod | grep nouveau
您还应该在 CentOS Stream 9 的应用程序菜单中找到NVIDIA X Server Settings应用程序。单击它。
NVIDIA X 服务器设置应用程序运行时应该没有任何错误,并且应该显示与您安装的 NVIDIA GPU 相关的大量信息。
2.6 测试
您还应该能够运行 NVIDIA 命令行程序,例如 nvidia-smi
。
[root@cheng ~]# nvidia-smi
Sun Dec 22 14:37:55 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01 Driver Version: 565.57.01 CUDA Version: 12.7 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3060 Off | 00000000:06:00.0 Off | N/A |
| 31% 23C P8 6W / 170W | 18MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
2、cuda-toolkit安装
2.1 安装
参考官网:CUDA Toolkit 12.8 Update 1 Downloads
2.2 环境配置
全局配置,对所有用户生效:
[chenfeng@iZ2ze8ss1mj33afx13mulcZ temp]$ sudo vim /etc/profile
在文件末尾追加:
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
然后,重启终端 或 执行 source /etc/profile
测试
nvcc --version
[chenfeng@iZ2ze8ss1mj33afx13mulcZ temp]$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:23:50_PST_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0