Esxi直通A40显卡给ubuntu20.4系统驱动安装过程记录
背景描述
PowerEdge R750(esxi虚拟化) 服务器已有一张T4显卡,后期新增一张A40显卡,开一台ubuntu20.4系统直通A40显卡无法开机!
开机问题解决后安装显卡驱动也各种报错!
一、esxi开虚拟机问题记录
1.1、虚拟机开通系统为ubuntu20.4结果无法开机。
报错信息如截图所示:
模块“DevicePowerOn”打开电源失败。 无法启动虚拟机。
解决办法参考文档:VMWare ESXi 开启显卡直通 (PCI 设备直通) 出现 DevicePowerOn 错误
解决办法:
esxi编辑虚拟机,高级、编辑配置
添加如下两个参数:
pciPassthru.64bitMMIOSizeGB:192
pciPassthru.use64bitMMIO:TRUE
以上,保存后可以正常开机。
1.2、安装高版本显卡驱动各种报错,低版本驱动安装后nvidia-smi无法查看到显卡
报错信息:
ERROR: Unable to load the 'nvidia-drm' kernel module.
Unable to load the kernel module 'nvidia.ko'. This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs
from the one used to build the target kernel, or if another driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA device(s), or no NVIDIA device
installed in this system is supported by this NVIDIA Linux graphics driver release.
解决过程:
1.2.1、使用排除法开了一台windows server2019服务器,安装显卡驱动可以正常识别显卡。
1.2.2、开了一台ubuntu22.04系统,依然有各种报错,无法识别显卡。
1.2.3、开了一台centos7.9、anolis7.9服务器也各种报错。无法启动。
1.2.4、看了下显卡驱动文档支持的系统,开了一台Rocky linux 8.6 直通显卡,在安装驱动的时候有一个报错,经过必应搜索需要去掉安全引导。去掉之后,安装驱动居然能识别了!
比对了下Rocky linux 和ubuntu系统虚拟机的差异。
Rockry linux 固件为EFI
ubuntu 固件为BIOS
解决办法:
将ubuntu的固件改为EFI
以上,修改保存后再次安装显卡驱动,WC出来了!
二、Ubuntu20.4安装显卡驱动。
2.1、安装驱动管理工具
apt install ubuntu-drivers-common -y
2.2、查看显卡驱动
root@user:~# ubuntu-drivers devices
ERROR:root:could not open aplay -l
Traceback (most recent call last):
File "/usr/share/ubuntu-drivers-common/detect/sl-modem.py", line 35, in detect
aplay = subprocess.Popen(
File "/usr/lib/python3.8/subprocess.py", line 858, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/lib/python3.8/subprocess.py", line 1704, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'aplay'
== /sys/devices/pci0000:00/0000:00:16.0/0000:0b:00.0 ==
modalias : pci:v000010DEd00002235sv000010DEsd0000145Abc03sc02i00
vendor : NVIDIA Corporation
driver : nvidia-driver-515-open - distro non-free
driver : nvidia-driver-470 - distro non-free
driver : nvidia-driver-515-server - distro non-free
driver : nvidia-driver-525-server - distro non-free
driver : nvidia-driver-525-open - distro non-free
driver : nvidia-driver-530 - distro non-free
driver : nvidia-driver-525 - distro non-free
driver : nvidia-driver-470-server - distro non-free
driver : nvidia-driver-530-open - distro non-free recommended
driver : nvidia-driver-515 - distro non-free
driver : nvidia-driver-510 - distro non-free
driver : xserver-xorg-video-nouveau - distro free builtin
== /sys/devices/pci0000:00/0000:00:0f.0 ==
modalias : pci:v000015ADd00000405sv000015ADsd00000405bc03sc00i00
vendor : VMware
model : SVGA II Adapter
manual_install: True
driver : open-vm-tools-desktop - distro free
2.3、安装显卡驱动
apt -y install nvidia-driver-515