zabbix监控keepalived主备状态以及脑裂
文章目录
- zabbix监控keepalived主备状态以及脑裂
- 环境说明:
- 1.配置keepalived监控主备状态的脚本
- 在master主机上编写脚本
- 在slave主机上编写脚本
- 2.配置keepalived加入监控脚本的配置
- 2.1.配置主keepalived配置文件
- 2.2.配置备keepalived
- 第一轮测试(测试keepalived是否监控haproxy负载均衡机)
- 3.对keepalived进行监控
- 3.1.在要slave主机(haproxy2)上安装agent
- 3.2.编辑一个脚本文件,用于获取服务的进程(脚本默认放在同一个地方,此处我们创建一个专门用于放置脚本文件的目录,放置到/scripts,不要放在用户家目录下面,防止后续出现权限受限的问题)
- 3.3.添加主机
- 3.4.创建监控项
- 3.5.创建触发器
- 4.6.第二轮测试
- 4.7.邮箱通知
- 4.7.邮箱通知
环境说明:
服务器名称 | IP地址 | 所需服务\架构 | 系统版本 |
---|---|---|---|
zabbix | 192.168.195.130 | lamp架构,zabbix_server | centos 8 |
haproxy1 | 192.168.195.133 | keepalived,haproxy | centos 8 |
haproxy2 | 192.168.195.134 | keepalived,haproxy | centos 8 |
web1 | 192.168.195.135 | http | centos 8 |
web2 | 192.168.195.136 | http | centos 8 |
注:下列步骤中,有关与zabbix监控服务、自定义监控以及haproxy配置负载均衡的详细内容,可以通过访问下列官网查看
监控服务zabbix部署-CSDN博客
zabbix服务自定义监控_碳烤小肥杨…的博客-CSDN博客
haproxy负载均衡-CSDN博客
1.配置keepalived监控主备状态的脚本
keepalived通过脚本监控负载均衡机的状态
在master主机上编写脚本
//该脚本是为了得知master主机上是否存在haproxy服务进程,如果没有则说明服务出现了问题,无法正常提供服务,所以我们写入判断,当haproxy进程小于1时则关闭keepalived服务,自动释放内存
[root@haproxy1 ~]# mkdir /scripts && cd /scripts
[root@haproxy1 scripts]# vim check_haproxy.sh
[root@haproxy1 scripts]# cat check_haproxy.sh
#!/bin/bash
haproxy_status=$(ps -ef|grep -Ev "grep|$0"|grep '\bhaproxy\b'|wc -l)
if [ $haproxy_status -lt 1 ];then
systemctl stop keepalived
fi
[root@haproxy1 scripts]# chmod +x check_haproxy.sh
[root@haproxy1 scripts]# ll
total 4
-rwxr-xr-x 1 root root 148 Oct 13 21:21 check_haproxy.sh
在slave主机上编写脚本
//该脚本是为了得知本主机是处于哪种状态(mastert|slave),当本主机变成master主机后,则进行第一个判断,当haproxy服务进程数小于1时,开启haproxy服务,继续进行负载均衡;而当本主机变回slave主机后,则进行第二个判断,当haproxy服务进程大于0时,关闭haproxy服务,避免与master主机上的haproxy服务产生冲突,从而导致流量无法正确转移到后端的web页面主机
[root@haproxy2 ~]# mkdir /scripts && cd /scripts
[root@haproxy2 scripts]# vim notify.sh
[root@haproxy2 scripts]# cat notify.sh
#!/bin/bash
case "$1" in
master)
haproxy_status=$(ps -ef|grep -Ev "grep|$0"|grep '\bhaproxy\b'|wc -l)
if [ $haproxy_status -lt 1 ];then
systemctl start haproxy
fi
;;
backup)
haproxy_status=$(ps -ef|grep -Ev "grep|$0"|grep '\bhaproxy\b'|wc -l)
if [ $haproxy_status -gt 0 ];then
systemctl stop haproxy
fi
;;
*)
echo "Usage:$0 master|backup VIP"
;;
esac
[root@haproxy2 scripts]# chmod +x notify.sh
[root@haproxy2 scripts]# ls
notify.sh
[root@haproxy2 scripts]# ll
total 4
-rwxr-xr-x 1 root root 461 Oct 13 21:25 notify.sh
2.配置keepalived加入监控脚本的配置
2.1.配置主keepalived配置文件
[root@haproxy1 ~]# vim /etc/keepalived/keepalived.conf
[root@haproxy1 ~]# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
router_id haproxy1
}
vrrp_script haproxy_check {
script "/scripts/check_haproxy.sh"
interval 1
fall 3
weight -40
}
vrrp_instance VI_1 {
state MASTER
interface ens160
virtual_router_id 80
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 12345678
}
virtual_ipaddress {
192.168.195.100
}
track_script {
haproxy_check
}
}
virtual_server 192.168.195.100 80 {
delay_loop 6
lb_algo rr
lb_kind NAT
persistence_timeout 50
protocol TCP
real_server 192.168.195.133 80 {
weight 1
TCP_CHECK {
connect_port 80
connect_timeout 3
nb_get_retry 3
delay_before_retry 3
}
}
real_server 192.168.195.134 80 {
weight 1
TCP_CHECK {
connect_port 80
connect_timeout 3
nb_get_retry 3
delay_before_retry 3
}
}
}
[root@haproxy1 ~]# systemctl restart keepalived.service
2.2.配置备keepalived
[root@haproxy2 ~]# vim /etc/keepalived/keepalived.conf
[root@haproxy2 ~]# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
router_id haproxy2
}
vrrp_instance VI_1 {
state BACKUP
interface ens160
virtual_router_id 80
priority 80
advert_int 1
authentication {
auth_type PASS
auth_pass 12345678
}
virtual_ipaddress {
192.168.195.100
}
notify_master "/scripts/notify.sh master"
notify_backup "/scripts/notify.sh backup"
}
virtual_server 192.168.195.100 80 {
delay_loop 6
lb_algo rr
lb_kind NAT
persistence_timeout 50
protocol TCP
real_server 192.168.195.133 80 {
weight 1
TCP_CHECK {
connect_port 80
connect_timeout 3
nb_get_retry 3
delay_before_retry 3
}
}
real_server 192.168.195.134 80 {
weight 1
TCP_CHECK {
connect_port 80
connect_timeout 3
nb_get_retry 3
delay_before_retry 3
}
}
}
[root@haproxy2 ~]# systemctl restart keepalived.service
第一轮测试(测试keepalived是否监控haproxy负载均衡机)
测试前查看服务状态
master主机
//keepalived服务和haproxy服务正常运行,查看vip
[root@haproxy1 ~]# systemctl is-active haproxy.service
active
[root@haproxy1 ~]# systemctl is-active keepalived.service
active
[root@haproxy1 ~]# ip a show ens160 | grep 192.168.195.100
inet 192.168.195.100/32 scope global ens160
[root@haproxy1 ~]# ip a show ens160 | grep 192.168.195.100 | wc -l
1
[root@haproxy1 ~]# ss -antl
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 0.0.0.0:80 0.0.0.0:*
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 0.0.0.0:8189 0.0.0.0:*
LISTEN 0 128 [::]:22 [::]:*
slave主机
//haproxy服务关闭,keepalved服务保持开启
[root@haproxy2 ~]# systemctl is-active haproxy.service
inactive
[root@haproxy2 ~]# systemctl is-active keepalived.service
active
[root@haproxy2 ~]# ip a show ens160 | grep 192.168.195.100
[root@haproxy2 ~]# ip a show ens160 | grep 192.168.195.100 | wc -l
0
[root@haproxy2 ~]# ss -antl
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 [::]:22 [::]:*
测试中
模拟master主机(haproxy1)的haproxy服务超负载导致服务关闭
//关闭haproxy服务后,keepalived配置文件中追踪的脚本检测到haproxy服务进程消失,则执行关闭keepalived服务的命令,自动释放内存,同时vip也会跳转到slave主机(haproxy2)主机上,从而成为新的master
[root@haproxy1 ~]# systemctl stop haproxy.service
[root@haproxy1 ~]# systemctl is-active haproxy.service
inactive
[root@haproxy1 ~]# systemctl is-active keepalived.service
inactive
[root@haproxy1 ~]# ip a show ens160 | grep 192.168.195.100
[root@haproxy1 ~]# ip a show ens160 | grep 192.168.195.100 | wc -l
0
[root@haproxy1 ~]# ss -antl
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 [::]:22 [::]:*
//此时我们再去查看slave主机(haproxy2)上的haproxy服务和vip,通过keepalived配置文件中的脚本检测,vip跳转到本机,本机成为新的master主机之后,执行master主机的任务,从而开启haproxy服务,继续进行负载均衡的任务
[root@haproxy2 ~]# systemctl is-active haproxy.service
active
[root@haproxy2 ~]# ss -antl
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 0.0.0.0:8189 0.0.0.0:*
LISTEN 0 128 0.0.0.0:80 0.0.0.0:*
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 [::]:22 [::]:*
[root@haproxy2 ~]# ip a show ens160 | grep 192.168.195.100
inet 192.168.195.100/32 scope global ens160
[root@haproxy2 ~]# ip a show ens160 | grep 192.168.195.100 | wc -l
1
//当我们的运维人员检修之后,使得源master主机(haproxy1)上的haproxy服务重启运行之后,我们再次开启keepalived服务,我们的vip将会被抢回来,从而重新成为master,而salve主机上的则会失去master的权利
master主机(haproxy1)
[root@haproxy1 ~]# systemctl start haproxy.service keepalived.service
[root@haproxy1 ~]# systemctl is-active haproxy.service
active
[root@haproxy1 ~]# systemctl is-active keepalived.service
active
[root@haproxy1 ~]# ss -antl
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 0.0.0.0:80 0.0.0.0:*
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 0.0.0.0:8189 0.0.0.0:*
LISTEN 0 128 [::]:22 [::]:*
[root@haproxy1 ~]# ip a show ens160 | grep 192.168.195.100
inet 192.168.195.100/32 scope global ens160
[root@haproxy1 ~]# ip a show ens160 | grep 192.168.195.100 | wc -l
1
slave主机(haproxy2)
[root@haproxy2 ~]# systemctl is-active haproxy.service
inactive
[root@haproxy2 ~]# systemctl is-active keepalived.service
active
[root@haproxy2 ~]# ip a show ens160 | grep 192.168.195.100
[root@haproxy2 ~]# ip a show ens160 | grep 192.168.195.100 | wc -l
0
[root@haproxy2 ~]# ss -antl
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 [::]:22 [::]:*
3.对keepalived进行监控
对keepalived服务的监控应在备用服务器上进行,通过添加zabbix自定义监控进行。
监控的信息是备上面有无VIP地址(192.168.195.100)
备机上出现VIP有两种情况:
- 发生了脑裂
- 正常的主备切换
监控只是监控发生脑裂的可能性,不能保证一定是发生了脑裂,因为正常的主备切换VIP也是会到备上的。
监控脚本如下:
[root@haproxy2 ~]# cd /scripts/
[root@haproxy2 scripts]# vim check_keepalived.sh
[root@haproxy2 scripts]# chmod +x check_keepalived.sh
[root@haproxy2 scripts]# cat check_keepalived.sh
#!/bin/bash
if [ `ip a show ens160 | grep 192.168.195.100 | wc -l` -ne 0 ]
then
echo "1"
else
echo "0"
fi
[root@haproxy2 scripts]# ll
total 8
-rwxr-xr-x 1 root root 115 Oct 14 00:06 check_keepalived.sh
-rwxr-xr-x 1 root root 444 Oct 13 22:01 notify.sh
3.1.在要slave主机(haproxy2)上安装agent
//下载zabbix
[root@haproxy2 ~]# wget https://cdn.zabbix.com/zabbix/sources/stable/6.4/zabbix-6.4.6.tar.gz
--2023-10-14 00:47:19-- https://cdn.zabbix.com/zabbix/sources/stable/6.4/zabbix-6.4.6.tar.gz
Resolving cdn.zabbix.com (cdn.zabbix.com)... 172.67.69.4, 104.26.6.148, 104.26.7.148, ...
Connecting to cdn.zabbix.com (cdn.zabbix.com)|172.67.69.4|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 43744978 (42M) [application/octet-stream]
Saving to: ‘zabbix-6.4.6.tar.gz.1’
zabbix-6.4.6.tar.gz.1 100%[==========================================>] 41.72M 600KB/s in 2m 0s
2023-10-14 00:49:20 (356 KB/s) - ‘zabbix-6.4.6.tar.gz.1’ saved [43744978/43744978]
[root@haproxy2 ~]# ls
anaconda-ks.cfg haproxy-2.7.10 haproxy-2.7.10.tar.gz zabbix-6.4.6.tar.gz
//创建用户并解压zabbix压缩包
[root@haproxy2 ~]# tar xf zabbix-6.4.6.tar.gz -C /usr/local/
[root@haproxy2 ~]# cd /usr/local/ && ls
bin etc games haproxy include lib lib64 libexec sbin share src zabbix-6.4.6
[root@haproxy2 local]# cd zabbix-6.4.6/
[root@haproxy2 zabbix-6.4.6]# useradd -r -M -s /sbin/nologin zabbix
//安装编译安装所需要的软件包
[root@haproxy2 zabbix-6.4.6]# yum -y install gcc gcc-c++ make
//进入zabbix-6.4.6的目录进行编译
[root@haproxy2 zabbix-6.4.6]# ./configure --enable-agent
省略. . .
***********************************************************
* Now run 'make install' *
* *
* Thank you for using Zabbix! *
* <http://www.zabbix.com> *
***********************************************************
最后报这个则表示编译成功,可直接使用make install安装
[root@haproxy2 zabbix-6.4.6]# make install
//修改zabbix客户端的配置文件
[root@haproxy2 zabbix-6.4.6]# vim /usr/local/etc/zabbix_agentd.conf
[root@haproxy2 zabbix-6.4.6]# grep -A2 '# ServerActive=' /usr/local/etc/zabbix_agentd.conf
# ServerActive=
ServerActive=192.168.195.130 //改为server端的ip
[root@haproxy2 zabbix-6.4.6]# grep -A2 '# Server=' /usr/local/etc/zabbix_agentd.conf
# Server=
Server=192.168.195.130 //改为server端的ip
[root@haproxy2 zabbix-6.4.6]# grep -A2 '# Hostname=' /usr/local/etc/zabbix_agentd.conf
# Hostname=
Hostname=note2 //修改主机名,必须全局唯一
//设置zabbix_agentd开机自启,将zabbix_server端配置好了的service文件传到slave(haproxy2)这台主机
[root@zabbix ~]# scp /usr/lib/systemd/system/zabbix_agentd.service root@192.168.195.134:/usr/lib/systemd/system/
root@192.168.195.134's password:
zabbix_agentd.service 100% 227 211.4KB/s 00:00
[root@haproxy2 ~]# systemctl daemon-reload
[root@haproxy2 ~]# systemctl enable --now zabbix_agentd.service
Created symlink /etc/systemd/system/multi-user.target.wants/zabbix_agentd.service → /usr/lib/systemd/system/zabbix_agentd.service.
[root@haproxy2 ~]# ss -antl
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 0.0.0.0:10050 0.0.0.0:*
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 [::]:22 [::]:*
服务启动成功
3.2.编辑一个脚本文件,用于获取服务的进程(脚本默认放在同一个地方,此处我们创建一个专门用于放置脚本文件的目录,放置到/scripts,不要放在用户家目录下面,防止后续出现权限受限的问题)
//该脚本得到的是主机上是否存在vip,如果slave主机(haproxy2)上存在vip,则说明master主机(haproxy1)上的haproxy服务出现问题,返回值报1,说明服务出现问题
[root@haproxy2 ~]# cd /scripts/
[root@haproxy2 scripts]# vim check_keepalived.sh
[root@haproxy2 scripts]# cat check_keepalived.sh
#!/bin/bash
if [ `ip a show ens160 | grep 192.168.195.100 | wc -l` -ne 0 ]
then
echo "1"
else
echo "0"
fi
[root@haproxy2 scripts]# chmod +x check_keepalived.sh
[root@haproxy2 scripts]# ./check_keepalived.sh //显示0说明该主机上没有vip
0
//进入配置文件,创建自定义监控任务
[root@haproxy2 scripts]# vim /usr/local/etc/zabbix_agentd.conf
[root@haproxy2 scripts]# tail -1 /usr/local/etc/zabbix_agentd.conf
UserParameter=check_keepalived,/bin/bash /scripts/check_keepalived.sh
//因为我们修改了配置文件,所以需要重启服务,重新读取配置文件内容
[root@haproxy2 scripts]# systemctl restart zabbix_agentd.service
//创建自定义监控任务后,我们需要在server端去测试一下是否能接受到被监控端的值
[root@client ~]# zabbix_get -s 192.168.195.134 -k check_keepalived
0 //成功接收到值
主机上的配置完成
3.3.添加主机
3.4.创建监控项
3.5.创建触发器
4.6.第二轮测试
模拟master主机(haproxy1)的haproxy服务超负载导致服务关闭
master主机(haproxy1)
[root@haproxy1 ~]# systemctl stop haproxy.service
[root@haproxy1 ~]# systemctl is-active haproxy.service
inactive
[root@haproxy1 ~]# systemctl is-active keepalived.service
inactive
[root@haproxy1 ~]# ss -antl
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 [::]:22 [::]:*
[root@haproxy1 ~]# ip a show ens160 | grep 192.168.195.100
[root@haproxy1 ~]# ip a show ens160 | grep 192.168.195.100 | wc -l
0
slave主机(haproxy2)
[root@haproxy2 ~]# systemctl is-active haproxy.service
active
[root@haproxy2 ~]# systemctl is-active keepalived.service
active
[root@haproxy2 ~]# ip a show ens160 | grep 192.168.195.100
inet 192.168.195.100/32 scope global ens160
[root@haproxy2 ~]# ip a show ens160 | grep 192.168.195.100 | wc -l
1
[root@haproxy2 ~]# ss -antl
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 0.0.0.0:8189 0.0.0.0:*
LISTEN 0 128 0.0.0.0:10050 0.0.0.0:*
LISTEN 0 128 0.0.0.0:80 0.0.0.0:*
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 [::]:22 [::]:*
4.7.邮箱通知
若想要实现发送邮箱的效果,详细步骤在下列网站中查看
[zabbix服务配置邮箱告警(定义媒介、配置动作)_碳烤小肥杨…的博客-CSDN博客]
4.7.邮箱通知
若想要实现发送邮箱的效果,详细步骤在下列网站中查看
zabbix服务配置邮箱告警(定义媒介、配置动作)_碳烤小肥杨…的博客-CSDN博客