文章目录
- 问题
- 调试&解释
- 异常日志
- 为什么进入Emergency shell 呢
- 为什么local-fs.target 失败
- 为什么storage.mount 超时
- 服务单元的依赖关系
- 那最后进入emergency mode 为什么会滚Login Incorrect 日志呢
- plymouth
- systemd-sulogin-shell
- sulogin
- 解决
问题
VM虚拟机启动不正常。正常ssh登录不上去,原因是没有IP,sshd没有启动。只能使用console进入,然后可用看到如下图这个的滚动Login incorrect。
调试&解释
这个完全不能做问题调试,只能重新reboot虚拟机。查看过往的系统日志。
看/var/log目录下的所有日志,都没有任何发现;
最后在journalctl的输出里看到异常的日志。
异常日志
Dec 02 11:37:09 sm-a systemd[1]: Started Emergency Shell.
Dec 02 11:37:09 sm-a systemd[1]: Reached target Emergency Mode.
Dec 02 11:37:11 sm-a systemd[871]: emergency.service: Executable /bin/plymouth missing, skipping: No such file or directory
当我们想做reboot的时候,它自己做了重启,可能是40分钟定时器,收到了SIGINT的信号,系统自动重启。
Dec 02 14:27:13 sm-a systemd[1]: Received SIGINT.
为什么进入Emergency shell 呢
往前看发现local-fs.target 没有启动起来;原因是依赖的服务单元没有准本好
Dec 02 11:37:09 sm-a systemd[1]: local-fs.target: Job local-fs.target/start failed with result ‘dependency’.
Dec 02 11:37:09 sm-a systemd[1]: local-fs.target: Triggering OnFailure= dependencies.
为什么local-fs.target 失败
发现storage.mount,服务单元失败;就是文件系统没有加载成功。超时。
Dec 02 11:37:09 sbc03-oam-a systemd[1]: storage.mount: Mount process exited, code=killed status=15
Dec 02 11:37:09 sbc03-oam-a systemd[1]: storage.mount: Failed with result ‘timeout’.
为什么storage.mount 超时
从系统日志里,没有看到原因。
服务单元的依赖关系
# cat systemd-remount-fs.service
# SPDX-License-Identifier: LGPL-2.1+
#
# This file is part of systemd.
#
# systemd is free software; you can redistribute it and/or modify it
# under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation; either version 2.1 of the License, or
# (at your option) any later version.
[Unit]
Description=Remount Root and Kernel File Systems
Documentation=man:systemd-remount-fs.service(8)
Documentation=https://www.freedesktop.org/wiki/Software/systemd/APIFileSystems
DefaultDependencies=no
Conflicts=shutdown.target
After=systemd-fsck-root.service
Before=local-fs-pre.target local-fs.target shutdown.target
Wants=local-fs-pre.target
ConditionPathExists=/etc/fstab
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/lib/systemd/systemd-remount-fs
# file /usr/lib/systemd/systemd-remount-fs
/usr/lib/systemd/systemd-remount-fs: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=204ac45be5b808f1f1b907fc41a612c3dccbfbcd, stripped
# cat /usr/lib/systemd/system/local-fs.target
# SPDX-License-Identifier: LGPL-2.1+
#
# This file is part of systemd.
#
# systemd is free software; you can redistribute it and/or modify it
# under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation; either version 2.1 of the License, or
# (at your option) any later version.
[Unit]
Description=Local File Systems
Documentation=man:systemd.special(7)
DefaultDependencies=no
Conflicts=shutdown.target
After=local-fs-pre.target
OnFailure=emergency.target
OnFailureJobMode=replace-irreversibly
那最后进入emergency mode 为什么会滚Login Incorrect 日志呢
emergency这个target有两个命令需要执行
ExecStartPre=-/bin/plymouth --wait quit 、、、这个没有找到文件;告诉plymouthd退出;
ExecStart=-@rootlibexecdir@/systemd-sulogin-shell emergency
/usr/lib/systemd/systemd-sulogin-shell
plymouth
这个找不到其实也没什么关系
[root@10 ~]# rpm -qf /bin/plymouth
plymouth-0.9.4-11.20200615git1e36e30.el8.x86_64
–wait
Wait for plymouthd to quit.
plymouth - Send commands to plymouthd;
systemd-sulogin-shell
executable(‘systemd-sulogin-shell’,
[‘src/sulogin-shell/sulogin-shell.c’],
include_directories : includes,
link_with : [libshared],
install_rpath : rootlibexecdir,
install : true,
install_dir : rootlibexecdir)
sulogin
从sulogin的代码里看还是有机会出现这次滚动日志的。如果从console上读取到垃圾数据,很有可能就死循环了。
while (1) {
const char *passwd = pwd->pw_passwd;
const char *answer;
int failed = 0, doshell = 0;
int deny = !opt_e && locked_account_password(pwd->pw_passwd);
doprompt(passwd, con, deny);
if ((answer = getpasswd(con)) == NULL)
break;
if (deny)
exit(EXIT_FAILURE);
/* no password or locked account */
if (!passwd[0] || locked_account_password(passwd))
doshell++;
else {
const char *cryptbuf;
cryptbuf = crypt(answer, passwd);
if (cryptbuf == NULL)
warn(_("crypt failed"));
else if (strcmp(cryptbuf, pwd->pw_passwd) == 0)
doshell++;
}
if (doshell) {
sushell(pwd);
failed++;
}
if (failed) {
fprintf(stderr, _("cannot execute su shell\n\n"));
break;
}
fprintf(stderr, _("Login incorrect\n\n"));
}
解决
根本原因磁盘加载有问题;需要解决这个磁盘问题。当然如果磁盘都出问题了,可能console也会有问题(根据问题扎堆原理);
绕开不停滚动的日志需要:
When the user exits from the single-user shell, or presses control-D at the prompt, the system will continue to boot.