一、开源轻量级进程监控工具monit的应用
今天在服务器杀进程时,发现有一个进程一直在重启,寻找服务器各种定时任务未发现有定时程序,也没有发现supervisord的进程管理服务,后来才发现服务器上启用了monit这个工具,monit是一个开源的轻量级监控工具,功能挺强大,基本可以满足大多数的需求,它能从多个层面进行监控,比如可以通过PID文件,可以通过端口号等来监控进程,也可以对服务器的负载内存等各数据进行监制,以自动维护进程,发送报警等。monit介绍:
系统监控:进程状态,系统负载,cpu负载,内存占用等。
进程监控:monit可以监控守护进程,当被监控进程异常退出时,可以自动被拉起。
文件系统:Monit可以监控本地文件、目录、文件系统的变化,包括时间戳、校验值、大小的变化。例如,可以监控文件sha1以及md5的值,来监控文件是否发生变化
网络监控:monit可以监控网络连接,支持TCP、UDP、Unix domain sockets以及HTTP、SMTP等。
monit安装使用起来都特别简单,一个命令安装完毕,修改一下配置文件,启动进程就完成了进程监控。
第一步:monit安装
安装非常简单,sudo yum install -y monit
[online@ER9 src]$ sudo yum install -y monit
......
Resolving Dependencies
--> Running transaction check
---> Package monit.x86_64 0:5.25.1-1.el6 will be installed
--> Finished Dependency Resolution
Installed:......
Complete!
第二步:配置文件整理
修改配置文件,比如我这里用来监控nginx服务。编辑/etc/monit.conf,在最后添加以下四行配置文本,其中各行的意思是:
第一行,check process searchd with pidfile指明nginx的pid文件位置。process后面的nginx为监控的名称,在status可以看到。
第二三行,start program和stop program 分别编辑好进程启动和停止命令sh文件位置,必须是绝对路径
第四行,指定if failed host IP地址的port失败,然后进行停止再启动。配置文本如下:
[online@ER9 nginx]$ vim /etc/monit.conf
#检测间隔时间配置,默认是120s
set daemon 30
set mailserver localhost
set mail-format { from: test@04007.cn }
set alert test@04007.cn
#在最后添加以下配置并编辑
check process nginx with pidfile /opt/data/pid/nginx/nginx.pid
start program = "/opt/modules/nginx/sbin/start.sh"
stop program = "/opt/modules/nginx/sbin/stop.sh"
if failed host 192.168.168.11 port 80 then restart
#对应的启动停止命令文本
[online@ER9 nginx]$ sudo cat /opt/modules/nginx/sbin/start.sh
#!/usr/bin/env bash
cd /opt/modules/nginx
/opt/modules/nginx/sbin/nginx
exit $?
[online@ER9 nginx]$ sudo cat /opt/modules/nginx/sbin/stop.sh
#!/usr/bin/env bash
cd /opt/modules/nginx
killall nginx
exit $?
第三步:启动monit进程
可以通过monit status查看监控列表,比如监控的Process
[online@ER9 nginx]$ which monit
/usr/bin/monit
[online@ER9 nginx]$ sudo /usr/bin/monit
New Monit id: bfada1971c1890edd164f7951d5d1a69
Stored in '/root/.monit.id'
Starting Monit 5.25.1 daemon with http interface at [localhost]:2812
#查看monit的状态
[online@ER9 nginx]$ sudo monit status
Monit 5.25.1 uptime: 0m
Process 'nginx'
status OK
monitoring status Monitored
monitoring mode active
on reboot start
pid 23273
parent pid 1
uid 0
effective uid 0
gid 0
uptime 19m
threads 1
children 16
cpu -
cpu total -
memory 0.0% [1.9 MB]
memory total 2.9% [454.3 MB]
security attribute (null)
disk write 0 B/s [8 kB total]
port response time 0.140 ms to 192.168.168.11:80 type TCP/IP protocol DEFAULT
data collected Tue, 19 Mar 2019 15:53:19
System 'BFG-OSER-4469'
status OK
monitoring status Monitored
monitoring mode active
on reboot start
load average [0.00] [0.00] [0.00]
cpu 0.0%us 0.0%sy 0.0%wa
memory usage 1.8 GB [11.6%]
swap usage 180 kB [0.0%]
uptime 746d 23h 59m
boot time Thu, 02 Mar 2017 15:53:35
data collected Tue, 19 Mar 2019 15:53:19
现在,可以在服务上执行stop.sh,然后等着monit起作用将nginx进程启来吧。
二、monit进程监控工具常用的监控配置案例示例大全
1. monit监控检测语法:
IF <TEST> THEN ACTION [ELSE IF SUCCEEDED THEN ACTION]
【action】包括 altert,start,stop.restart,exec
【alert】 触发报警邮件
【start|stop|restart】 就是触发start、stop、restart program。restart就是先执行stop再执行start.
【exec】 可以自定义一个脚本来执行
2.monit监控配置常用写法及案例
monit可以将启动停止命令写在单独的shell文件中,也可将启动停止命令直接写在配置中;可以监控服务IP,也可以指定域名host主机;可以指定监控循环周期,遵循分时日月周定时任务配置,也可以指定排除在哪些时间段不用检测等等设置。看到有一个网页有比较详细的介绍,链接:http://www.cnblogs.com/52fhy/p/6412547.html 此处也收集了很多monit的常用监控案例,如有需要可以搜索进来快速查看和应用,笔记收录于此:
#将启动停止命令写在单独的shell文件中
check process nginx with pidfile /opt/data/pid/nginx/nginx.pid
start program = "/opt/modules/nginx/sbin/start.sh"
stop program = "/opt/modules/nginx/sbin/stop.sh"
if failed host 192.168.162.11 port 80 then restart
if failed host www.04007.cn port 443 then restart
# 设置在10个监视周期内重,启了9次则超时,不再监视这个服务。
if 9 restarts within 10 cycles then timeout
# 如果在5个周期内该服务的cpu使用率都超过90%则提示
if cpu usage > 90% for 5 cycles then alert
# 若连续5个周期打开url都失败(120秒超时,超时也认为失败)则重启服务
if failed url http://127.0.0.1:4000/ timeout 120 seconds for 5 cycles then restart
#exec监控再执行另外的脚本
check host 192.168.11.12 with address 192.168.11.12
if failed port 80 with timeout 1 seconds for 2 cycles then exec "/opt/monit/contrab/notify.py"
#也可将启动停止命令直接写在配置中
check process nginx with pidfile /usr/local/nginx/run/nginx.pid
start program = "/usr/local/nginx/sbin/nginx" with timeout 30 seconds
stop program = "/usr/local/nginx/sbin/nginx -s stop"
if failed host 192.168.162.11 port 80 protocol http then restart
if 5 restarts within 5 cycles then timeout
#可以指定监控循环周期,遵循分时日月周定时任务配置
check process nginx with pidfile /var/run/nginx.pid
every 2 cycles
Example 2: Check every workday 8AM-7PM
check program checkOracleDatabase with
path /var/monit/programs/checkoracle.pl
every “* 8-19 * * 1-5”
Example 3: 在这个时间端,就不要检测 Sunday 0AM-3AM
check process mysqld with pidfile /var/run/mysqld.pid
not every “* 0-3 * * 0”
#可以用expect做数据推送
check process apache with pidfile /var/run/httpd.pid
start “/etc/init.d/httpd start”
stop “/etc/init.d/httpd stop”
if failed
host www.sol.no port 80 and
send “GET / HTTP/1.1\r\nHost: www.04007.cn\r\n\r\n”
expect “HTTP/[0-9\.]{3} 200.*”
then alert
#监控配置文件修改后重启各种应用
check file monit.conf path /etc/monit.conf
group system
if changed sha1 checksum
then exec "/usr/bin/monit -c /etc/monit.conf reload"
#监控sshd服务
check process sshd with pidfile /var/run/sshd.pid
start program "/etc/init.d/sshd start"
stop program "/etc/init.d/sshd stop"
if failed port 22 protocol ssh then restart
if 5 restarts within 5 cycles then timeout
#监控sshd服务
check process mysql with pidfile /var/run/mysqld/mysqld.pid
group database
start program = "/etc/init.d/mysqld start"
stop program = "/etc/init.d/mysqld stop"
if failed host 127.0.0.1 port 3306 then restart
if 5 restarts within 5 cycles then timeout
#监控php
check process php_fpm with pidfile /var/run/php_fpm.pid
start program = "/etc/init.d/php_fpm start"
stop program = "/etc/init.d/php_fpm stop"
if failed host 127.0.0.1 port 9000 then restart
if 5 restarts within 5 cycles then timeout