阿里云服务器无法使用SSH连接,网站访问也出现异常,登录阿里云平台,系统提示:系统出现了内核Panic,OOM异常或内部宕机、性能抖动。后台询问了阿里云客服,说需要安装和开kdump 服务,于是开始了kdump的学习。
kdump概念:
当系统崩溃时,kdump 使用 kexec 启动到第二个内核,第二个内核通常叫做捕获内核,以很小内存启动以捕获转储镜像。第一个内核保留了内存的一部分给第二内核启动用。由于 kdump 利用 kexec 启动捕获内核,绕过了 BIOS,所以第一个内核的内存得以保留。这是内核崩溃转储的本质。
kdump正常运行的条件:
1. 系统中开启kdump服务
2. 启动文件配置中,合理分配了崩溃内存容量
CentOS7: 检查系统中kdump状态的方法:
systemctl status kdump.service
centos7 默认已安装kdump:
yum install kernel-debuginfo kexec-tools crash
yum install kexec-tools
设置crashkernel预留内存大小,修改/etc/default/grub文件
找到GRUB_CMDLINE_LINUX配置项,修改crashkernel的值,默认auto,须根据服务器内存大小合理设置crashkernel的值,如果系统的内存 <= 8 GB 对kdump kernel不会保留任何内容(等同于关闭kdump),如果系统的内存> 8 GB但是<= 16 GB,crashkernel=auto会保留256M,等同于crashkernel=256M,如果系统内存> 16GB, crashkernel=auto会保留512M, 等同于crashkernel=512M.
3.需要重新生成grub配置文件,重启系统才能生效
grub2-mkconfig -o /boot/grub2/grub.cfg
reboot
4.开启kdump服务:
systemctl start kdump.service //启动kdump
systemctl enable kdump.service //设置开机启动
5.输入命令systemctl status kdump.service检查kdump服务时否开启
输入命令 systemctl
is
-active kdump.service
如果提示Starting kdump:[OK]则启动完成。
6.手动触发一下crash dump
echo 1 >/proc/sys/kernel/sysrq; echo c > /proc/sysrq-trigger
如果没有问题,系统会自动重启,重启后可以看到在/var/crash/目录下生成了coredump文件
打开crash来分析:
# crash vmcore /usr/lib/debug/lib/modules/3.10.0-957.1.3.el7.x86_64/vmlinux
crash 7.2.3-8.el7
Copyright (C) 2002-2017 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...
WARNING: kernel relocated [126MB]: patching 85619 gdb minimal_symbol values
KERNEL: /usr/lib/debug/lib/modules/3.10.0-957.1.3.el7.x86_64/vmlinux
DUMPFILE: vmcore [PARTIAL DUMP]
CPUS: 4
DATE: Fri Jun 18 05:32:32 2021
UPTIME: 00:47:57
LOAD AVERAGE: 0.00, 0.01, 0.05
TASKS: 413
NODENAME: localhost.localdomain
RELEASE: 3.10.0-957.1.3.el7.x86_64
VERSION: #1 SMP Thu Nov 29 14:49:43 UTC 2018
MACHINE: x86_64 (3799 Mhz)
MEMORY: 2 GB
PANIC: "SysRq : Trigger a crash"
PID: 12653
COMMAND: "bash"
TASK: ffffa1071aca8000 [THREAD_INFO: ffffa1074b32c000]
CPU: 3
STATE: TASK_RUNNING (SYSRQ)
crash>