零、问题描述:
在安装Atlas800-9000服务器的驱动的时候,可能会出现错误:Dkms install failed, details in : /var/log/ascend_seclog/ascend_install.log 如下所示:
[root@localhost ~]# ./Ascend-hdk-910-npu-driver_23.0.rc3_linux-aarch64.run --full --install-for-all
Verifying archive integrity... 100% SHA256 checksums are OK. All good.
Uncompressing ASCEND DRIVER RUN PACKAGE 100%
[Driver] [2023-12-09 23:55:45] [INFO]Start time: 2023-12-09 23:55:45
[Driver] [2023-12-09 23:55:45] [INFO]LogFile: /var/log/ascend_seclog/ascend_install.log
[Driver] [2023-12-09 23:55:45] [INFO]OperationLogFile: /var/log/ascend_seclog/operation.log
[Driver] [2023-12-09 23:55:45] [INFO]base version is none.
[Driver] [2023-12-09 23:55:45] [WARNING]Do not power off or restart the system during the installation/upgrade
[Driver] [2023-12-09 23:55:45] [INFO]set username and usergroup, HwHiAiUser:HwHiAiUser
/usr/local/Ascend/driver/tools/upgrade-tool: error while loading shared libraries: libdrvdsmi_host.so: cannot open shared object file: No such file or directory
[Driver] [2023-12-09 23:56:42] [INFO]driver install type: DKMS
[Driver] [2023-12-09 23:56:42] [INFO]upgradePercentage:10%
[Driver] [2023-12-09 23:56:49] [INFO]upgradePercentage:30%
[Driver] [2023-12-09 23:56:49] [INFO]upgradePercentage:40%
[Driver] [2023-12-09 23:56:56] [ERROR]Dkms install failed, details in : /var/log/ascend_seclog/ascend_install.log
[Driver] [2023-12-09 23:56:56] [ERROR]Driver_ko_install failed, details in : /var/log/ascend_seclog/ascend_install.log
[Driver] [2023-12-09 23:56:56] [INFO]Failed to install driver package, please retry after uninstall and reboot!
[Driver] [2023-12-09 23:56:56] [INFO]End time: 2023-12-09 23:56:56
[root@localhost ~]# vim /var/log/ascend_seclog/ascend_install.log
出现该问题的主要原因是可能是你服务器的内核版本不支持导致,解决方式可以降低内核的版本,如下是驱动对各个内核版本的支持情况:
首先使用uname -r查看当前的内核版本。
4.18.0-348.20.1.el7.aarch64
使用如下命令检测:
rpm -qa |grep kernel
结果如下:
kernel-modules-4.18.0-348.20.1.el7.aarch64
kernel-4.14.0-115.el7a.0.1.aarch64 # 这个是支持的版本,但是安装了118的内核,需要把把启动内核换成这个版本
kernel-headers-4.18.0-348.20.1.el7.aarch64
kernel-devel-4.18.0-348.20.1.el7.aarch64
kernel-4.18.0-348.20.1.el7.aarch64
kernel-tools-4.18.0-348.20.1.el7.aarch64
kernel-tools-libs-4.18.0-348.20.1.el7.aarch64
kernel-core-4.18.0-348.20.1.el7.aarch64
一、查看开机的内核
cat /boot/grub2/grub.cfg |grep menuentry
结果:
cat /boot/grub2/grub.cfg |grep menuentry
if [ x"${feature_menuentry_id}" = xy ]; then
menuentry_id_option="--id"
menuentry_id_option=""
export menuentry_id_option
menuentry 'CentOS Linux (4.18.0-348.20.1.el7.aarch64) 7 (AltArch)' --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-4.18.0-348.20.1.el7.aarch64-advanced-720f46b4-ad98-426c-962f-3a77ce8f01a9' {
menuentry 'CentOS Linux (4.14.0-115.el7a.0.1.aarch64) 7 (AltArch)' --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-4.14.0-115.el7a.0.1.aarch64-advanced-720f46b4-ad98-426c-962f-3a77ce8f01a9' {
menuentry 'CentOS Linux (0-rescue-f5d62bf864c94b9a9860cc8775ffdd7d) 7 (AltArch)' --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-0-rescue-f5d62bf864c94b9a9860cc8775ffdd7d-advanced-720f46b4-ad98-426c-962f-3a77ce8f01a9' {
二、使用grub2-mkconfig -o /boot/grub2/grub.cfg查看有哪些内核:
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.18.0-348.20.1.el7.aarch64
Found initrd image: /boot/initramfs-4.18.0-348.20.1.el7.aarch64.img
Found linux image: /boot/vmlinuz-4.14.0-115.el7a.0.1.aarch64
Found initrd image: /boot/initramfs-4.14.0-115.el7a.0.1.aarch64.img
Found linux image: /boot/vmlinuz-0-rescue-f5d62bf864c94b9a9860cc8775ffdd7d
Found initrd image: /boot/initramfs-0-rescue-f5d62bf864c94b9a9860cc8775ffdd7d.img
三、修改开机默认使用内核
grub2-set-default 'CentOS Linux (4.14.0-115.el7a.0.1.aarch64) 7 (AltArch)' # 中间**4.14.0-115.el7a.0.1.aarch64**那部分换成你的内核版本号
执行 grub2-mkconfig -o /boot/grub2/grub.cfg 使配置文件生效
四、检查内核修改结果
使用grub2-editenv list命令查看修改结果
saved_entry=CentOS Linux (4.14.0-115.el7a.0.1.aarch64) 7 (AltArch)
五、reboot 重启大法
重启完成后,执行uname -r:
4.14.0-115.el7a.0.1.aarch64
搞定~