文章目录
- 1,问题
- 1.1,rc.local
- 1.2,watchdog.py
- 2,问题排查
- 2.1,手动执行start.sh后功能正常
- 2.2,开机启动后rc.local加载start.sh,然后start.sh启动python脚本报错
- 2.3,怀疑是rc.local加载的时候,python脚本中用到的psutil模块还没加载
- 3,方案一:让start.sh脚本后台运行,直到watchdog.py执行成功后才退出
- 3.1,start.sh脚本中最多重试5次后退出,间隔5秒
- 3.2,但开机启动后,发现返回了脚本执行成功,然后才打印了watchdog.py中报错
- 3.3,那么是不是会是nohup使start.sh脚本忽略了watchdog.py启动时的异常呢
- 3.4,去除nohup后,报错依旧,仍然不成功
- 4,方案二:不使用返回值而采用判断进程是否存
- 4.1,start.sh脚本如下所示
- 4.1,开机启动后,返现仍然无法启动watchdog.py,仍然报错psutil找不到
- 4.2,发现rc.local启动后的start.sh脚本进程属于root用户
- 4.3,而手动启动后的start.sh脚本进程属于username用户
- 4.4,那么有没有可能就是由于启动start.sh脚本的用户不同,一个有psutil模块,一个没有导致的失败呢?
- 4.4.1,开机后rc.local脚本启动start.sh脚本,打印用户名是root
- 4.4.2,手动启动start.sh脚本,打印用户名是当前username
- 4.4.3,如果想要跟踪脚本运行的命令对不对,可以使用exec 1>>file和set -x
- 5,方案三:指定启动watchdog.sh的用户
- 5.1,修改start.sh脚本如下所示
- 5.2,重启电脑后发现完美解决了
1,问题
场景是windos10开机后在其ubuntu子系统中启动其他服务。
在ubuntu子系统rc.local脚本中启动shell脚本没问题,但是启动python脚本却会失败,会报某模块找不到
百度查到说是由于rc.local脚本的执行顺序先于python脚本的依赖库造成的
2023-04-20 00:46:08.1681922768 import error: No module named ‘psutil’
1.1,rc.local
#!/bin/bash -x
LOG_FILE="/mnt/e/111111111/package/log/rc.local.log"
exec 1>> $LOG_FILE 2>&1
set -x
echo password|sudo -S /etc/init.d/apache2 restart
TIMES=`date +"%Y-%m-%d %H:%M:%S.%3N"`
echo "$TIMES /etc/rc.local start apache2" >> $LOG_FILE
cd /xxxxx/package
nohup ./start.sh >> $LOG_FILE 2>&1 &
TIMES=`date +"%Y-%m-%d %H:%M:%S.%3N"`
echo "$TIMES /etc/rc.local start start.sh" >> $LOG_FILE
exit 0
1.2,watchdog.py
import subprocess
import time
import psutil
import setproctitle
# 重命名进程名
setproctitle.setproctitle("watchdog.py")
def checkProcess(process_name):
for process in psutil.process_iter():
if process.name() == process_name:
return True
return False
def killProcess(process_name):
for process in psutil.process_iter():
if process.name() == process_name:
process.kill()
def startProces(path, program):
param = [path + "/"+ program]
if checkProcess(program):
killProcess(program)
print("{} is existed and has been killed! it will be resatrt for a moment!".format(program))
process = subprocess.Popen(param)
print("{} started! process[{}]".format(program, process))
return process
def run(path, program):
process = startProces(path, program)
while True:
time.sleep(5) # 检查每隔60秒
if process.poll() is not None: # 检查进程是否崩溃
process = startProces(path, program)
else:
print("{} is running".format(program))
if __name__ == '__main__':
run("server", "webserver")
2,问题排查
为了方便问题解决更加方便,我准备了一个shell脚本start.sh,在rclocal中启动start.sh,在start.sh中启动python脚本。
#!/bin/bash
LOG_FILE="/mnt/e/package/log/start.log"
LOG()
{
TIMES=`date +"%Y-%m-%d %H:%M:%S.%3N"`
echo "${TIMES}$1" >> $LOG_FILE
}
cd /mnt/e/package
nohup python3 watchdog.py >> $LOG_FILE 2>&1 &
LOG "EXECUTE_CMD /mnt/e/package python3 watchdog.py. result:$?"
exit $?
2.1,手动执行start.sh后功能正常
start.log中如下所示
2023-04-20 19:37:26.780 EXECUTE_CMD /mnt/e/package python3 watchdog.py. result:0
2.2,开机启动后rc.local加载start.sh,然后start.sh启动python脚本报错
wangdog进程和webserver进程都没起来,start.log中如下所示
2023-04-22 02:06:06.082EXECUTE_CMD /mnt/e/package python3 watchdog.py. result:0
Traceback (most recent call last):
File "/mnt/e/package/watchdog.py", line 3, in <module>
import psutil
ModuleNotFoundError: No module named 'psutil'
2.3,怀疑是rc.local加载的时候,python脚本中用到的psutil模块还没加载
手动执行脚start.sh能正常运行,说明start.sh和watchdog.py两个脚本本身没问题
3,方案一:让start.sh脚本后台运行,直到watchdog.py执行成功后才退出
3.1,start.sh脚本中最多重试5次后退出,间隔5秒
#!/bin/bash
LOG_FILE="/mnt/e/package/log/start.log"
LOG()
{
TIMES=`date +"%Y-%m-%d %H:%M:%S.%3N"`
echo "${TIMES} $1" >> $LOG_FILE
}
EXECUTE_CMD(){
FILE_PATH=$1
CMD=$2
PROGRAM=$3
RETRY=5 # 最多重试次数
COUNT=1 # 当前重试次数
while true; do
cd $FILE_PATH
nohup $CMD $PROGRAM >> $LOG_FILE 2>&1 & # 脚本重新启动
if [ $? -eq 0 ]; then
# 执行成功,退出循环
break
fi
if [ $COUNT -ge $RETRY ]; then
# 达到最大重试次数,强制退出
LOG "ERROR: Command['$CMD $PROGRAM'] failed even after $COUNT retries! Exiting."
return 1
fi
COUNT=$(expr $COUNT + 1)
LOG "Command['$CMD $PROGRAM'] failed, retrying in $INTERVAL seconds... (retry $COUNT/$RETRY)"
sleep 5
done
LOG "Command['$CMD $PROGRAM'] succeeded after $COUNT retryies."
return 0
}
EXECUTE_CMD /mnt/e/package python3 watchdog.py
LOG "EXECUTE_CMD /mnt/e/package python3 watchdog.py. result:$?"
exit $?
3.2,但开机启动后,发现返回了脚本执行成功,然后才打印了watchdog.py中报错
2023-04-22 02:20:56.955 Command['python3 watchdog.py'] succeeded after 1 retryies.
2023-04-22 02:20:56.958 EXECUTE_CMD /mnt/e/project/anweimian/package python3 watchdog.py. result:0
Traceback (most recent call last):
File "/mnt/e/project/anweimian/package/watchdog.py", line 3, in <module>
import psutil
ModuleNotFoundError: No module named 'psutil'
3.3,那么是不是会是nohup使start.sh脚本忽略了watchdog.py启动时的异常呢
nohup : 运行命令忽略挂起信号
& 是指后台运行;
nohup 的功能和& 之间的功能并不相同。其中,nohup 可以使得命令永远运行下去和用户终端没有关系。当我们断开ssh 连接的时候不会影响他的运行。而& 表示后台运行。当ssh 断开连接的时候(用户退出或挂起的时候),命令也自动退出。
#$CMD $PROGRAM >> $LOG_FILE 2>&1 & # 脚本重新启动
#修改为:
nohup $CMD $PROGRAM >> $LOG_FILE 2>&1 & # 脚本重新启动
3.4,去除nohup后,报错依旧,仍然不成功
2023-04-22 02:31:56.906 Command['python3 watchdog.py'] succeeded after 1 retryies.
2023-04-22 02:31:56.909 EXECUTE_CMD /mnt/e/project/anweimian/package python3 watchdog.py. result:0
Traceback (most recent call last):
File "/mnt/e/project/anweimian/package/watchdog.py", line 3, in <module>
import psutil
ModuleNotFoundError: No module named 'psutil'
4,方案二:不使用返回值而采用判断进程是否存
4.1,start.sh脚本如下所示
#!/bin/bash
LOG_FILE="/mnt/e/package/log/start.log"
LOG()
{
TIMES=`date +"%Y-%m-%d %H:%M:%S.%3N"`
echo "${TIMES} $1" >> $LOG_FILE
}
RUN_PYTHON()
{
FILE_PATH=$1
PROGRAM=$2
while true # 循环检测脚本是否停止
do
procnum=$(ps -ef | grep "$PROGRAM" | grep -v grep | wc -l) # 记录正在运行run.py的数量
if [[ ${procnum} == 0 ]] ; then # 如果run.py正在运行数量等于0,脚本中断,需要重启
LOG "procnum[$procnum] not found $PROGRAM"
cd $FILE_PATH
nohup python3 $PROGRAM >> $LOG_FILE 2>&1 & # 脚本重新启动
sleep 2 # 睡眠60s,每60s检测一次
else
LOG "procnum[$procnum] found $PROGRAM"
sleep 60 # 睡眠60s,每60s检测一次
fi
done
}
RUN_PYTHON /mnt/e/package watchdog.py
LOG "RUN_PYTHON /mnt/e/package watchdog.py. result:$?"
exit $?
4.1,开机启动后,返现仍然无法启动watchdog.py,仍然报错psutil找不到
2023-04-22 02:42:47.519 procnum[0] not found watchdog.py
Traceback (most recent call last):
File "/mnt/e/project/anweimian/package/watchdog.py", line 3, in <module>
import psutil
ModuleNotFoundError: No module named 'psutil'
这就十分不应该,即便psutil加载慢,但总应该可以加载上的,但是不管start.sh脚本循环等待多久,watchdog.py脚本中一直报错
4.2,发现rc.local启动后的start.sh脚本进程属于root用户
4.3,而手动启动后的start.sh脚本进程属于username用户
2023-04-22 02:51:11.555 procnum[0] not found watchdog.py
2023-04-22 02:51:13.573 procnum[1] found watchdog.py
2023-04-22 02:52:13.590 procnum[1] found watchdog.py
4.4,那么有没有可能就是由于启动start.sh脚本的用户不同,一个有psutil模块,一个没有导致的失败呢?
脚本中增加如下命令
LOG "user name = ${USER}"
4.4.1,开机后rc.local脚本启动start.sh脚本,打印用户名是root
2023-04-20 00:55:24.271 user name = root
4.4.2,手动启动start.sh脚本,打印用户名是当前username
2023-04-20 00:56:44.329 user name = username
4.4.3,如果想要跟踪脚本运行的命令对不对,可以使用exec 1>>file和set -x
# 将输出到终端的打印都写入文件
exec 1>> $LOG_FILE 2>&1
#该命令后执行的命令都打印到终端
set -x
#取消执行的命令打印到终端
set +x
5,方案三:指定启动watchdog.sh的用户
5.1,修改start.sh脚本如下所示
#!/bin/bash
LOG_FILE="/mnt/e/package/log/start.log"
exec 1> $LOG_FILE 2>&1
LOG()
{
TIMES=`date +"%Y-%m-%d %H:%M:%S.%3N"`
echo "${TIMES} $1" >> $LOG_FILE
}
RUN_PYTHON()
{
FILE_PATH=$1
PROGRAM=$2
set -x
cd $FILE_PATH
nohup echo password|sudo -S -u username python3 $PROGRAM >> $LOG_FILE 2>&1 & # 脚本启动
set +x
}
RUN_PYTHON /mnt/e/package watchdog.py
LOG "RUN_PYTHON /mnt/e/package watchdog.py. result:$?"
exit $?
5.2,重启电脑后发现完美解决了
start.log日志打印如下所示
+ cd /mnt/e/package
+ nohup echo 931108
+ set +x
+ sudo -S -u wangxinyuan python3 watchdog.py
2023-04-22 03:20:51.157 RUN_PYTHON /mnt/e/package watchdog.py. result:0