搭建基于etcd+patroni+pgbouncer+haproxy+keepalived的postgresql集群方案
宿主机操作系统:ubuntu20.04
使用kvm搭建虚拟环境(如没有安装kvm,请先自行安装kvm)
1、安装kvm服务
①、查看虚拟支持
如果CPU 支持硬件虚拟化则输出结果大于0,安装kvm-ok命令检查是否启用虚拟化支持,否则需要重启进入BIOS查看是否启用虚拟化支持,如果BIOS也已经启用虚拟化支持但是输出依旧为0则表示当前CPU不支持虚拟化(不能使用虚拟机)
liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ grep -Eoc '(vmx|svm)' /proc/cpuinfo
8
liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ sudo apt-get install cpu-check -y
[sudo] password for liyi:
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package cpu-check
liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ kvm
kvm kvm-ok kvm-spice
liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ kvm-ok
INFO: /dev/kvm exists
KVM acceleration can be used
②、安装kvm
更新系统环境
liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ sudo apt-get update -y
...更新输出
安装依赖包
liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ sudo apt-get install -y qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils virt-manager
将当前用户加入kvm所属组
liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ sudo usermod -aG libvirt $USER
liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ sudo usermod -aG kvm $USER
...
liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ cat /etc/group | grep kvm
kvm:x:108:liyi
liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ cat /etc/group | grep libv
libvirt:x:136:liyi
libvirt-qemu:x:64055:libvirt-qemu
libvirt-dnsmasq:x:137:
可以查看到当前用户已经被加入到了kvm组
③、启动kvm服务
liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ systemctl start libvirtd.service
liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ systemctl status libvirtd.service
● libvirtd.service - Virtualization daemon
Loaded: loaded (/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2024-03-09 09:45:11 CST; 57min ago
TriggeredBy: ● libvirtd-ro.socket
● libvirtd-admin.socket
● libvirtd.socket
Docs: man:libvirtd(8)
https://libvirt.org
Main PID: 982 (libvirtd)
Tasks: 26 (limit: 32768)
Memory: 44.3M
CGroup: /system.slice/libvirtd.service
├─ 982 /usr/sbin/libvirtd
├─1322 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/br2.conf --leasefile-ro -->
├─1323 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/br2.conf --leasefile-ro -->
├─1358 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/br0.conf --leasefile-ro -->
├─1359 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/br0.conf --leasefile-ro -->
├─1392 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/br1.conf --leasefile-ro -->
├─1393 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/br1.conf --leasefile-ro -->
├─1424 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-r>
└─1425 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-r>
3月 09 10:26:27 liyi-RESCUER-R720-lenovo dnsmasq-dhcp[1424]: DHCPREQUEST(virbr0) 192.168.122.119 52:5>
3月 09 10:26:27 liyi-RESCUER-R720-lenovo dnsmasq-dhcp[1424]: DHCPACK(virbr0) 192.168.122.119 52:54:00>
3月 09 10:34:08 liyi-RESCUER-R720-lenovo dnsmasq-dhcp[1424]: DHCPREQUEST(virbr0) 192.168.122.85 52:54>
3月 09 10:34:08 liyi-RESCUER-R720-lenovo dnsmasq-dhcp[1424]: DHCPACK(virbr0) 192.168.122.85 52:54:00:>
liyi@liyi-RESCUER-R720-lenovo:~/Desktop$
3、使用kvm配置脚本创建虚拟机(需要下载相关的centos操作系统ISO镜像文件…此处省略部分操作)
虚拟机环境和IP地址
主机名 | IP | 安装环境 | 备注 |
---|---|---|---|
pg01 | 192.168.122.77 | postgresql14,pgbouncer,patroni leader | 主,yum安装pg,yum安装pgbouncer,yum安装patroni |
pg01 | 192.168.122.85 | postgresql14,patroni replica | 备,yum安装pgr,yum安装patroni |
pg01 | 192.168.122.102 | postgresql14,patroni replica | 备,yum安装pg,yum安装patroni |
etcd1 | 192.168.122.225 | etcd 3.3.11 | leader,yum安装etcd |
etcd2 | 192.168.122.138 | etcd 3.3.11 | follower,yum安装etcd |
etcd3 | 192.168.122.119 | etcd 3.3.11 | follower,yum安装etcd |
ha_keep1 | 192.168.122.53,浮动IP:192.168.122.3 | haproxy,keepalived | yum安装 |
ha_keep2 | 192.168.122.54,浮动IP:192.168.122.3 | haproxy,keepalived | yum安装 |
4、搭建postgresql环境
①、3台pg虚拟机安装postgresql,名称分别为:pg01、pg02、pg03
主机名 | IP | 安装环境 | 备注 |
---|---|---|---|
pg01 | 192.168.122.77 | postgresql14 | 主,yum安装pg |
pg01 | 192.168.122.85 | postgresql14 | 备,yum安装pg |
pg01 | 192.168.122.102 | postgresql14 | 备,yum安装pg |
具体操作和安装postgresql版本请参考postgresql官网: https://www.postgresql.org/download/linux/redhat/
这里我选择postgre版本14:
[root@pg01 ~]# yum makecache -y
[root@pg01 ~]# yum update -y
[root@pg01 ~]# sudo yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
[root@pg01 ~]# sudo yum install -y postgresql14-server
[root@pg01 ~]# sudo /usr/pgsql-14/bin/postgresql-14-setup initdb
[root@pg01 ~]# sudo systemctl start postgresql-14
[root@pg01 ~]# sudo systemctl enable postgresql-14
查看postgresql服务状态
[root@pg01 ~]# systemctl status postgresql-14
● postgresql-14.service - PostgreSQL 14 database server
Loaded: loaded (/usr/lib/systemd/system/postgresql-14.service; enabled; vendor preset: disabled)
Active: active (running) since 五 2024-03-08 20:50:47 EST; 1h 8min ago
Docs: https://www.postgresql.org/docs/14/static/
Process: 1022 ExecStartPre=/usr/pgsql-14/bin/postgresql-14-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)
Main PID: 1049 (postmaster)
CGroup: /system.slice/postgresql-14.service
├─1049 /usr/pgsql-14/bin/postmaster -D /var/lib/pgsql/14/data/
├─1416 postgres: logger
├─1536 postgres: checkpointer
├─1537 postgres: background writer
├─1538 postgres: walwriter
├─1539 postgres: autovacuum launcher
├─1540 postgres: stats collector
└─1541 postgres: logical replication launcher
3月 08 20:50:38 pg01 systemd[1]: Starting PostgreSQL 14 database server...
3月 08 20:50:45 pg01 postmaster[1049]: 2024-03-08 20:50:45.329 EST [1049] LOG: redirecting l...ess
3月 08 20:50:45 pg01 postmaster[1049]: 2024-03-08 20:50:45.329 EST [1049] HINT: Future log o...g".
3月 08 20:50:47 pg01 systemd[1]: Started PostgreSQL 14 database server.
Hint: Some lines were ellipsized, use -l to show in full.
[root@pg01 ~]#
pg02、pg03同理进行postgresql安装…
②、开启postgresql 访问端口
telnet检查3台postgresql5432端口是否可以访问
firewalld开放5432端口
[root@pg01 ~]# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
Active: active (running) since 五 2024-03-08 22:55:32 EST; 1s ago
Docs: man:firewalld(1)
Main PID: 1953 (firewalld)
CGroup: /system.slice/firewalld.service
└─1953 /usr/bin/python2 -Es /usr/sbin/firewalld --nofork --nopid
3月 08 22:55:32 pg01 systemd[1]: Starting firewalld - dynamic firewall daemon...
3月 08 22:55:32 pg01 systemd[1]: Started firewalld - dynamic firewall daemon.
3月 08 22:55:32 pg01 firewalld[1953]: WARNING: AllowZoneDrifting is enabled. This is conside...now.
Hint: Some lines were ellipsized, use -l to show in full.
[root@pg01 ~]# firewall-cmd --zone=public --add-port=5432/tcp --permanent
success
[root@pg01 ~]# firewall-cmd --list-all
public (active)
target: default
icmp-block-inversion: no
interfaces: eth0
sources:
services: dhcpv6-client ssh
ports: 5432/tcp
protocols:
masquerade: no
forward-ports:
source-ports:
icmp-blocks:
rich rules:
配置postgresql pg_hba允许pg02、pg03服务器连接,配置postgresql.conf监听所有端口[*]
[root@pg01 ~]# vim /var/lib/pgsql/14/data/pg_hba.conf
# IPv4 local connections:
# server connected
host all all 192.168.122.77/32 md5
host all all 192.168.122.85/32 md5
host all all 192.168.122.102/32 md5
host all all 127.0.0.1/32 scram-sha-256
#host all all 0.0.0.0/0 md5
[root@pg01 ~]# vim /var/lib/pgsql/14/data/postgresql.conf
#listen_addresses = 'localhost' # what IP address(es) to listen on;
listen_addresses = '*'
重启postgresql服务
[root@pg01 ~]# systemctl restart postgresql-14
telnet测试5432端口
[root@pg01 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.122.77 pg01
192.168.122.85 pg02
192.168.122.102 pg03
[root@pg01 ~]# telnet pg01
Trying 192.168.122.77...
telnet: connect to address 192.168.122.77: Connection refused
[root@pg01 ~]# telnet pg01 5432
Trying 192.168.122.77...
Connected to pg01.
Escape character is '^]'.
^CConnection closed by foreign host.
[root@pg01 ~]# telnet pg02 5432
Trying 192.168.122.85...
Connected to pg02.
Escape character is '^]'.
^CConnection closed by foreign host.
[root@pg01 ~]# telnet pg03 5432
Trying 192.168.122.102...
Connected to pg03.
Escape character is '^]'.
^CConnection closed by foreign host.
postgresql端口访问正常。
③、配置主库
配置主服务器postgresql.conf,开启archive设置archive path
[root@pg01 ~]# vim /var/lib/pgsql/14/data/postgresql.conf
206 wal_level = replica # minimal, replica, or logical
211 synchronous_commit = off # synchronization level;
246 archive_mode = on # enables archiving; off, on, or alw ays
248 archive_command = 'cp %p /var/lib/pgsql/14/backups/%f' # command to use to archive a logfile segment
299 max_wal_senders = 10 # max number of walsender processes
创建用于复制的用户
[root@pg01 ~]# su - postgres
Last login: 五 3月 8 22:45:30 EST 2024 on pts/0
-bash-4.2$ psql
postgres=# create role repli_user login encrypted password 'repli_password' replication;
postgres=# select usename,usesysid,passwd from pg_user;
usename | usesysid | passwd
------------+----------+----------
postgres | 10 | ********
repli_user | 16393 | ********
(2 rows)
postgres=# \q
更改pg01主库的pg_hba文件,将之前配置的trust 的允许所有(all)更改为允许新创建的复制用户,以准许repli_user复制
[root@pg01 ~]# vim /var/lib/pgsql/14/data/pg_hba.conf
# server connected
host replication repli_user 192.168.122.77/32 md5
host replication repli_user 192.168.122.85/32 md5
host replication repli_user 192.168.122.102/32 md5
host all all 127.0.0.1/32 scram-sha-256
host all all 0.0.0.0/0 md5
[root@pg01 ~]# systemctl restart postgresql-14
配置主库postgres用户目录下.pgpass用户名密码文件并设置文件模式为400
-bash-4.2$ pwd
/var/lib/pgsql
-bash-4.2$ cat .pgpass
192.168.122.77:5432:replication:repli_user:repli_password
④、配置从库
配置从库postgres用户目录下.pgpass用户名密码文件并设置文件模式为400
-bash-4.2$ cat .pgpass
192.168.122.77:5432:replication:repli_user:repli_password
使用backup将主库archive到从库backups目录,以检查从库连接主库
-bash-4.2$ ls -lh 14/backups/
total 0
-bash-4.2$ pg_basebackup -F p -D 14/backups -h 192.168.122.77 -p 5432 -U repli_user -F p -v -R
-bash-4.2$ ls -lh 14/backups/
total 200K
-rw-------. 1 postgres postgres 225 Mar 9 03:42 backup_label
-rw-------. 1 postgres postgres 135K Mar 9 03:42 backup_manifest
drwx------. 5 postgres postgres 41 Mar 9 03:42 base
......
配置backups/postgresql.conf 添加primary_conninfo路径
primary_conninfo = 'host=192.168.122.77 port=5432 user=repli_user password=repli_password'
清空从库的data目录,把backups下所有同步数据复制到从库的data目录
[root@pg02 ~]# systemctl stop postgresql-14
[root@pg02 ~]# su - postgres
-bash-4.2$ rm -rf /var/lib/pgsql/14/data/*
-bash-4.2$ cp -r 14/backups/* /var/lib/pgsql/14/data/
-bash-4.2$ exit
logout
[root@pg02 ~]#
[root@pg02 ~]# systemctl start postgresql-14
从库服务重启正常,再切换到主库的数据库查询窗口执行replication 日志发送状态查询
postgres=# select * from pg_stat_replication;
*
(1 row)
从库查询日志接收状态
postgres=# select * from pg_stat_wal_receiver ;
*
(1 row)
pg03从服务器配置依旧如上
从pg01主服务器查看pg_stat_replication表
postgres=# select pid,usesysid,usename,application_name,client_addr,client_hostname,client_port,backend_start from pg_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start
------+----------+------------+------------------+----------------+-----------------+-------------+-------------------------------
4279 | 16393 | repli_user | walreceiver | 192.168.122.85 | | 59074 | 2024-03-09 05:20:55.374547-05
(1 row)
postgres=#
postgres=# select pid,usesysid,usename,application_name,client_addr,client_hostname,client_port,backend_start from pg_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start
------+----------+------------+------------------+-----------------+-----------------+-------------+-------------------------------
4353 | 16393 | repli_user | walreceiver | 192.168.122.85 | | 59076 | 2024-03-09 05:43:10.798228-05
4359 | 16393 | repli_user | walreceiver | 192.168.122.102 | | 39832 | 2024-03-09 05:45:26.745926-05
(2 rows)
⑤、主服务器(pg01)创建test table查看从服务器同步状况
5、搭建etcd环境
①、3台etcd虚拟机安装etcd服务:etcd1、etcd2、etcd3
主机名 | IP | 安装环境 | 备注 |
---|---|---|---|
etcd1 | 192.168.122.225 | etcd 3.3.11 | leader,yum安装etcd |
etcd2 | 192.168.122.138 | etcd 3.3.11 | follower,yum安装etcd |
etcd3 | 192.168.122.119 | etcd 3.3.11 | follower,yum安装etcd |
[root@etcd1 ~]# yum makecache -y
Loaded plugins: fastestmirror
Determining fastest mirrors
* base: mirrors.bfsu.edu.cn
* extras: mirrors.bfsu.edu.cn
* updates: mirrors.qlu.edu.cn
base | 3.6 kB 00:00:00
extras | 2.9 kB 00:00:00
updates | 2.9 kB 00:00:00
(1/10): base/7/x86_64/group_gz | 153 kB 00:00:00
(2/10): extras/7/x86_64/filelists_db | 303 kB 00:00:00
(3/10): extras/7/x86_64/other_db | 150 kB 00:00:00
(4/10): base/7/x86_64/other_db | 2.6 MB 00:00:01
(5/10): extras/7/x86_64/primary_db | 250 kB 00:00:01
(6/10): updates/7/x86_64/other_db | 1.5 MB 00:00:03
(7/10): updates/7/x86_64/filelists_db | 14 MB 00:00:05
(8/10): base/7/x86_64/filelists_db | 7.2 MB 00:00:08
(9/10): base/7/x86_64/primary_db | 6.1 MB 00:00:09
(10/10): updates/7/x86_64/primary_db | 25 MB 00:00:12
Metadata Cache Created
安装etcd:3.3.11版本
[root@etcd1 ~]# yum install etcd -y
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: mirrors.bfsu.edu.cn
* extras: mirrors.bfsu.edu.cn
* updates: mirrors.qlu.edu.cn
Resolving Dependencies
--> Running transaction check
---> Package etcd.x86_64 0:3.3.11-2.el7.centos will be installed
--> Finished Dependency Resolution
......
[root@etcd1 ~]# etcd --version
etcd Version: 3.3.11
Git SHA: 2cf9e51
Go Version: go1.10.3
Go OS/Arch: linux/amd64
②、配置每个节点etcd.conf
后面出现了一些问题,因为配置内的TOKEN设置是纯字母加下划线,重启服务会告知TOKEN过于简单,后面改成了1hretcd-cluster 四分之三原则
③、使用etcdctl member list查看集群状态
④、集群健康状态检查
[root@etcd1 ~]# sed -i 's/ETCD_INITIAL_CLUSTER_STATE="new"/ETCD_INITIAL_CLUSTER_STATE="existing"/g' /etc/etcd/etcd.conf | grep ETCD_INITIAL_CLUSTER_STATE
[root@etcd1 ~]# cat /etc/etcd/etcd.conf | grep ETCD_INITIAL_CLUSTER_STATE
ETCD_INITIAL_CLUSTER_STATE="existing"
firewall-cmd开放etcd集群每个成员主机的2379端口,否则会导致etcd集群降级
[root@etcd1 ~]# etcdctl cluster-health
member c6cde425fdd57d1 is healthy: got healthy result from http://192.168.122.225:2379
failed to check the health of member 92c68a626a26e458 on http://192.168.122.119:2379: Get http://192.168.122.119:2379/health: dial tcp 192.168.122.119:2379: i/o timeout
member 92c68a626a26e458 is unreachable: [http://192.168.122.119:2379] are all unreachable
failed to check the health of member cc4e6b745b5521d9 on http://192.168.122.138:2379: Get http://192.168.122.138:2379/health: dial tcp 192.168.122.138:2379: i/o timeout
member cc4e6b745b5521d9 is unreachable: [http://192.168.122.138:2379] are all unreachable
cluster is degraded
[root@etcd1 ~]# firewall-cmd --add-port=2379/tcp
success
[root@etcd1 ~]#
[root@etcd1 ~]# etcdctl cluster-health
member c6cde425fdd57d1 is healthy: got healthy result from http://192.168.122.225:2379
member 92c68a626a26e458 is healthy: got healthy result from http://192.168.122.119:2379
member cc4e6b745b5521d9 is healthy: got healthy result from http://192.168.122.138:2379
cluster is healthy
6、搭建patroni
主机名 | IP | 安装环境 | 备注 |
---|---|---|---|
pg01 | 192.168.122.77 | postgresql14,patroni leader | 主,yum安装pg,yum安装patroni |
pg01 | 192.168.122.85 | postgresql14,patroni replica | 备,yum安装pg,yum安装patroni |
pg01 | 192.168.122.102 | postgresql14,patroni replica | 备,yum安装pg,yum安装patroni |
①、在pg01、pg02、pg03三台机器上安装python3环境和patroni
yum install python3 python3-pip -y
pip3 install --upgrade pip
pip install psycopg2-binary
pip install patroni
patroni --help #正常输出
[root@pg01 ~]# pip list
Package Version
------------------ -----------
click 8.0.4
dnspython 2.2.1
importlib-metadata 4.8.3
patroni 3.2.2
pip 21.3.1
prettytable 2.5.0
psutil 5.9.8
psycopg2-binary 2.9.5
python-dateutil 2.9.0.post0
python-etcd 0.4.5
PyYAML 6.0.1
setuptools 59.6.0
six 1.16.0
typing_extensions 4.1.1
urllib3 1.26.18
wcwidth 0.2.13
ydiff 1.2
zipp 3.6.0
创建patroni参数
[root@pg01 ~]# mkdir -p /app/patroni/etc/
[root@pg01 ~]# mkdir -p /app/patroni/log/
[root@pg01 ~]# chown postgres.postgres -R /app/patroni/
[root@pg01 ~]# cat > /app/patroni/etc/patroni_config.yml <<"EOF"
scope: pg_cluster
namespace: /service
name: pg01
log:
level: INFO
traceback_level: ERROR
dir: /app/patroni/log
file_num: 10
file_size: 104857600
restapi:
listen: 0.0.0.0:8008
connect_address: 192.168.122.77:8008
etcd:
host: 192.168.122.225:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
max_timelines_history: 0
master_start_timeout: 300
master_stop_timeout: 0
synchronous_mode: false
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
max_connections: 3000
superuser_reserved_connections: 100
max_locks_per_transaction: 64
max_worker_processes: 2
max_prepared_transactions: 0
wal_level: logical
wal_log_hints: on
track_commit_timestamp: off
max_wal_senders: 10
max_replication_slots: 10
listen_addresses: "*"
port: 5432
cluster_name: "pg_cluster"
archive_mode: on
archive_command: "cp %p /var/lib/pgsql/14/backups/%f"
postgresql:
listen: 0.0.0.0:5432
connect_address: 192.168.122.77:5432
data_dir: /var/lib/pgsql/14/data
bin_dir: /usr/pgsql-14/bin
pgpass: /var/lib/pgsql/.pgpass
pg_ctl_timeout: 60
use_pg_rewind: true
remove_data_directory_on_rewind_failure: false
remove_data_directory_on_diverged_timelines: true
authentication:
replication:
username: repli_user
password: repli_password
superuser:
username: postgres
password: postgres
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
EOF
[root@pg02 ~]# mkdir -p /app/patroni/etc/
[root@pg02 ~]# mkdir -p /app/patroni/log/
[root@pg02 ~]# chown postgres.postgres -R /app/patroni/
[root@pg02 ~]# cat > /app/patroni/etc/patroni_config.yml <<"EOF"
scope: pg_cluster
namespace: /service
name: pg02
log:
level: INFO
traceback_level: ERROR
dir: /app/patroni/log
file_num: 10
file_size: 104857600
restapi:
listen: 0.0.0.0:8008
connect_address: 192.168.122.85:8008
etcd:
host: 192.168.122.138:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
max_timelines_history: 0
master_start_timeout: 300
master_stop_timeout: 0
synchronous_mode: false
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
max_connections: 3000
superuser_reserved_connections: 100
max_locks_per_transaction: 64
max_worker_processes: 2
max_prepared_transactions: 0
wal_level: logical
wal_log_hints: on
track_commit_timestamp: off
max_wal_senders: 10
max_replication_slots: 10
listen_addresses: "*"
port: 5432
cluster_name: "pg_cluster"
archive_mode: on
archive_command: "cp %p /var/lib/pgsql/14/backups/%f"
postgresql:
listen: 0.0.0.0:5432
connect_address: 192.168.122.85:5432
data_dir: /var/lib/pgsql/14/data
bin_dir: /usr/pgsql-14/bin
pgpass: /var/lib/pgsql/.pgpass
pg_ctl_timeout: 60
use_pg_rewind: true
remove_data_directory_on_rewind_failure: false
remove_data_directory_on_diverged_timelines: true
authentication:
replication:
username: repli_user
password: repli_password
superuser:
username: postgres
password: postgres
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
EOF
[root@pg03 ~]# mkdir -p /app/patroni/etc/
[root@pg03 ~]# mkdir -p /app/patroni/log/
[root@pg03 ~]# chown postgres.postgres -R /app/patroni/
[root@pg03 ~]# cat > /app/patroni/etc/patroni_config.yml <<"EOF"
scope: pg_cluster
namespace: /service
name: pg03
log:
level: INFO
traceback_level: ERROR
dir: /app/patroni/log
file_num: 10
file_size: 104857600
restapi:
listen: 0.0.0.0:8008
connect_address: 192.168.122.102:8008
etcd:
host: 192.168.122.119:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
max_timelines_history: 0
master_start_timeout: 300
master_stop_timeout: 0
synchronous_mode: false
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
max_connections: 3000
superuser_reserved_connections: 100
max_locks_per_transaction: 64
max_worker_processes: 2
max_prepared_transactions: 0
wal_level: logical
wal_log_hints: on
track_commit_timestamp: off
max_wal_senders: 10
max_replication_slots: 10
listen_addresses: "*"
port: 5432
cluster_name: "pg_cluster"
archive_mode: on
archive_command: "cp %p /var/lib/pgsql/14/backups/%f"
postgresql:
listen: 0.0.0.0:5432
connect_address: 192.168.122.102:5432
data_dir: /var/lib/pgsql/14/data
bin_dir: /usr/pgsql-14/bin
pgpass: /var/lib/pgsql/.pgpass
pg_ctl_timeout: 60
use_pg_rewind: true
remove_data_directory_on_rewind_failure: false
remove_data_directory_on_diverged_timelines: true
authentication:
replication:
username: repli_user
password: repli_password
superuser:
username: postgres
password: postgres
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
EOF
在运行之前需要给patroni用到的super用户(postgres)开放本地登录,配置pg_hba
-bash-4.2$ vim 14/data/pg_hba.conf
# server connected
host replication repli_user 192.168.122.77/32 md5
host replication repli_user 192.168.122.85/32 md5
host replication repli_user 192.168.122.102/32 md5
host all all 127.0.0.1/32 scram-sha-256
host all postgres 127.0.0.1/32 md5
测试patroni
-bash-4.2$ patroni /app/patroni/etc/patroni_config.yml
查看设置的patroni log信息
-bash-4.2$ tail -f /app/patroni/log/patroni.log
2024-03-09 19:23:06,721 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:23:16,797 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:23:26,790 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:23:36,901 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:23:46,835 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:23:57,068 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:24:06,688 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:24:16,926 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:24:26,906 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:24:36,882 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:24:46,828 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:24:56,748 INFO: no action. I am (pg01), the leader with the lock
另外两台log显示
2024-03-09 19:33:12,764 INFO: Lock owner: pg01; I am pg02
2024-03-09 19:33:12,849 INFO: failed to start postgres
2024-03-09 19:33:32,809 INFO: Lock owner: pg01; I am pg03
2024-03-09 19:33:32,888 INFO: failed to start postgres
新建patroni.service服务
cat > /usr/lib/systemd/system/patroni.service <<"EOF"
[Unit]
Description=patroni
After=network.target remote-fs.target nss-lookup.target
[Service]
Type=forking
User=postgres
Group=postgres
Environment="PGHOME=/usr/pgsql-14"
Environment="PGDATA=/var/lib/pgsql/14/data"
Environment="PGPORT=5432"
Environment="LD_LIBRARY_PATH=/usr/pgsql-14/lib"
Environment="PATH=/usr/pgsql-14/bin:/usr/local/bin:/usr/bin"
ExecStart=/bin/bash -c "/usr/local/bin/patroni /app/patroni/etc/patroni_config.yml >> /app/patroni/log/patroni.log 2>&1 &"
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/usr/bin/kill -9 patroni
KillMode=process
TimeoutSec=30
Restart=no
[Install]
WantedBy=multi-user.target
EOF
检查patroni集群状态
-bash-4.2$ patronictl -c /app/patroni/etc/patroni_config.yml list
+ Cluster: pg_cluster (7344240387828606601) ---+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+--------+-----------------+---------+---------+----+-----------+
| pg01 | 192.168.122.77 | Leader | running | 7 | |
| pg02 | 192.168.122.85 | Replica | running | 4 | 48 |
| pg03 | 192.168.122.102 | Replica | running | 7 | 16 |
+--------+-----------------+---------+---------+----+-----------+
-bash-4.2$
如果replica主机state状态为start failed就需要查看是否是该节点同步的postgresql .partial有如果有问题,需要把backup目录删除重新执行backup命令同步状态
pg_basebackup -F p -D 14/backups -h 192.168.122.77 -p 5432 -U repli_user -F p -v -R
7、主节点安装pgbouncer
yum install pgbouncer -y
vim /etc/pgbouncer/pgbouncer.ini
[databases]
my_test_db = host=localhost port=5432 dbname=my_test_db user=postgres #注意db配置
[pgbouncer]
logfile = /var/log/pgbouncer/pgbouncer.log
pidfile = /var/run/pgbouncer/pgbouncer.pid
listen_addr = *
listen_port = 6432
auth_type = md5 #需要和pg_hba.conf中的 trust、MD5保持一致
auth_file = /etc/pgbouncer/userlist.txt
admin_users = postgres
stats_users = postgres
vim /etc/pgbouncer/userlist.txt
"postgres" "SCRAM**"
systemctl restart pgbouncer
登录测试
-bash-4.2$ psql -h 127.0.0.1 -p 6432 -d my_test_db
Password for user postgres:
psql (14.11)
Type "help" for help.
my_test_db=# \d pg_user;
View "pg_catalog.pg_user"
Column | Type | Collation | Nullable | Default
--------------+--------------------------+-----------+----------+---------
usename | name | | |
usesysid | oid | | |
usecreatedb | boolean | | |
usesuper | boolean | | |
userepl | boolean | | |
usebypassrls | boolean | | |
passwd | text | | |
valuntil | timestamp with time zone | | |
useconfig | text[] | C | |
my_test_db=#
8、搭建haproxy、keepalived
主机名 | IP | 安装环境 | 备注 |
---|---|---|---|
ha_keep1 | 192.168.122.53 | haproxy,keepalived | yum安装 |
ha_keep2 | 192.168.122.54 | haproxy,keepalived | yum安装 |
yum makecache
yum update -y
yum install -y vim haproxy
setenforce 0 #会影响haproxy 服务的端口绑定
增加/etc/hosts解析
192.168.122.77 pg01
192.168.122.85 pg02
192.168.122.102 pg03
编辑haproxy配置文件
主库postgresql 的pgbouncer端口6432
另外两个从库是5432
vim /etc/haproxy/haproxy.cfg
global
log 127.0.0.1 local2 #
chroot /var/lib/haproxy #
pidfile /var/run/haproxy.pid #
maxconn 3000 #
user haproxy #
group haproxy #
daemon #
nbproc 1 #
stats socket /var/lib/haproxy/stats
defaults
mode tcp # mode {tcp|http|health}
log 127.0.0.1 local2 err # use syslog
option tcplog #
option dontlognull #
option redispatch #
option abortonclose #
retries 3 #
maxconn 3000 #
timeout queue 1m #
timeout connect 10s #
timeout client 1m #
timeout server 1m #
timeout check 5s #
listen status
bind *:1080 #
mode http
log global
stats enable #
stats refresh 30s #
maxconn 10 #
stats uri / # http//ip:1080/
stats realm Private lands #
stats auth admin:Admin2023 # user password
stats hide-version
listen master
bind *:5000
mode tcp
option tcplog
balance roundrobin # 轮询
option httpchk OPTIONS /master
http-check expect status 200
default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
server pg01 192.168.122.77:6432 maxconn 1000 check port 8008 inter 5000 rise 2 fall 2
server pg02 192.168.122.85:5432 maxconn 1000 check port 8008 inter 5000 rise 2 fall 2
server pg03 192.168.122.102:5432 maxconn 1000 check port 8008 inter 5000 rise 2 fall 2
setenforce 0
systemctl start haproxy
firewall-cmd --add-port=1080/tcp #开放绑定端口的访问 http
可以通过http://192.168.122.53:1080/ 和 http://192.168.122.54:1080/ 进行haproxy访问
安装keepalived
给53 和 54 统一安装keepalived
yum install keepalived -y
配置53为主节点master 增加浮动IP地址
[root@ha_keep1 ~]# vim /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
router_id pgservice_router
}
vrrp_script chk_haproxy {
script "/usr/bin/killall -0 haproxy"
interval 2
weight 5
fall 3
rise 5
timeout 2
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 53
priority 200
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.122.3
}
track_script {
chk_haproxy
}
}
[root@ha_keep1 ~]# systemctl restart keepalived
[root@ha_keep1 ~]# systemctl enable keepalived
[root@ha_keep1 ~]# systemctl status keepalived
[root@ha_keep1 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:00:a1:49 brd ff:ff:ff:ff:ff:ff
inet 192.168.122.53/24 brd 192.168.122.255 scope global noprefixroute dynamic eth0
valid_lft 3128sec preferred_lft 3128sec
inet 192.168.122.3/32 scope global eth0 #浮动地址已经出现
valid_lft forever preferred_lft forever
inet6 fe80::ee78:b71e:23c0:7fc4/64 scope link noprefixroute
valid_lft forever preferred_lft forever
配置54为备节点backup (因为是备节点,目前浮动地址还挂在主节点53上)
[root@ha_keep2 ~]# vim /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
router_id pgservice_router
}
vrrp_script chk_haproxy {
script "/usr/bin/killall -0 haproxy"
interval 2
weight 5
fall 3
rise 5
timeout 2
}
vrrp_instance VI_1 {
state BACKUP #更改为备节点
interface eth0
virtual_router_id 53
priority 100 #降低优先级
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.122.3
}
track_script {
chk_haproxy
}
}
[root@ha_keep2 ~]# systemctl restart keepalived
[root@ha_keep2 ~]# systemctl enable keepalived
[root@ha_keep2 ~]# systemctl status keepalived
[root@ha_keep2 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:a2:85:b8 brd ff:ff:ff:ff:ff:ff
inet 192.168.122.54/24 brd 192.168.122.255 scope global noprefixroute dynamic eth0
valid_lft 3572sec preferred_lft 3572sec
inet6 fe80::df8b:cdfb:69d3:8f24/64 scope link noprefixroute
valid_lft forever preferred_lft forever
使用浮动地址:http://192.168.122.3:1080/来查看keepalived链接haproxy的状态
END!结束