Patroni是Cybertec公司基于python语言开发的,可用于使用流复制来创建,管理,维护和监视高可用性PostgreSQL集群设置的工具。
目前,Patroni+Etcd 是最为推荐的PostgreSQL数据库高可用方案之一。
PostgreSQL有postgres_exporter监控采集器。对于Patroni高可用工具自身的监控,有啥子监控方案吗?
一起来看看 ,鲜为人知的 patroni-exporter 吧。
一、patroni-exporter 部署
环境要求:Requires python >= 3.6
软件包下载:https://github.com/Showmax/patroni-exporter
patroni_export 安装
1.1、安装python系统包
pip3.6 install prometheus_client
pip3.6 install python-dateutil
1.2、unzip 解压 patroni-exporter-master.zip
[root@HD-IOV-PROMETHEUS-MONITOR patroni]# ll
total 12
drwx------ 3 root root 137 Aug 1 10:39 patroni-exporter-master
-rw------- 1 root root 8566 Aug 1 10:31 patroni-exporter-master.zip
1.3、 patroni-exporter服务启动脚本
# cat /etc/systemd/system/patroni_exporter.service
[Unit]
Description=patroni_exporter
After=network.target
[Service]
Type=simple
ExecStart=/usr/bin/python3.6 /root/dba_zc/patroni/patroni-exporter-master/patroni_exporter.py --port 51234 --patroni-url http://172.24.131.8:8008/patroni --timeout 5
TimeoutSec = 60
Restart = on-failure
RestartSec = 2
[Install]
1.4、patroni_exporter启动
systemctl start patroni_exporter
二、patroni-exporter对接Prometheus
为了方便,这里采用静态注册:静态的将服务的IP和抓取指标的端口号配置在Prometheus yaml文件的scrape_configs配置下。
- job_name: patroni-job
static_configs:
- targets: ['172.26.234.25:51234','172.26.234.25:51238','172.26.234.25:51239']
访问prometheus web,能看到各监控export为UP即正常
三、patroni-exporter采集数据Grafana可视化
我基于patronictl version 1.6.1 下 patroni-exporter采集数据,编写了一个 Grafana Dashboard for Patroni exporter,效果如下
四、patroni基于Alertmanager告警
# cat patroni_rules.yml
groups:
- name: for_common
rules:
- alert: IOV-patroni_patroni_info
expr: patroni_patroni_info != 1
for: 1m
labels:
level: 3
annotations:
cur_value: '{{ $value }}'
description: '{{ $labels.instance}} of {{$labels.job}} patroni has down 1m'
- alert: IOV-patroni_patroni_pause
expr: patroni_patroni_pause != 0
for: 1m
labels:
level: 3
annotations:
cur_value: '{{ $value }}'
description: '{{ $labels.instance}} of {{$labels.job}} patroni 处于 pause 1m'
- alert: IOV-patroni_postgresql_timeline
expr: changes(patroni_postgresql_timeline[1m]) != 0
for: 1m
labels:
level: 3
annotations:
cur_value: '{{ $value }}'
description: '{{ $labels.instance}} of {{$labels.job}} patroni change postgresql_timeline 1m'
五、patroni 服务高可用性优化
针对patroni服务非正常关闭,os systemd自动重启拉起,保障工具的高可用性。
异常终止后自动重启,systemd控制参数
Restart=always 、RestartSec=5、 StartLimitInterval=0
[root@ZL-IOV-ZNA-L2-DBORCH02 system]# cat /etc/systemd/system/patroni.service
[Unit]
Description=Runners to orchestrate a high-availability PostgreSQL
After=syslog.target network.target
[Service]
Type=simple
User=postgres
Group=postgres
#StandardOutput=syslog
ExecStartPre=-/usr/bin/sudo /sbin/modprobe softdog
ExecStartPre=-/usr/bin/sudo /bin/chown postgres /dev/watchdog
ExecStart=/usr/bin/patroni /software/patroni/patroni.yml
ExecReload=/bin/kill -s HUP $MAINPID
KillMode=process
TimeoutSec=30
Restart=always
RestartSec=5
StartLimitInterval=0
[Install]
WantedBy=multi-user.target
patroni 高可用测试
patroni异常终止,自动重新拉起