ubuntu20.04环境搭建:etcd+patroni+pgbouncer+haproxy+keepalived的postgresql集群方案

搭建基于etcd+patroni+pgbouncer+haproxy+keepalived的postgresql集群方案
宿主机操作系统:ubuntu20.04

使用kvm搭建虚拟环境(如没有安装kvm，请先自行安装kvm)

1、安装kvm服务

①、查看虚拟支持

如果CPU 支持硬件虚拟化则输出结果大于0，安装kvm-ok命令检查是否启用虚拟化支持，否则需要重启进入BIOS查看是否启用虚拟化支持，如果BIOS也已经启用虚拟化支持但是输出依旧为0则表示当前CPU不支持虚拟化(不能使用虚拟机)

liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ grep -Eoc '(vmx|svm)' /proc/cpuinfo
8

liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ sudo apt-get install cpu-check -y
[sudo] password for liyi: 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
E: Unable to locate package cpu-check

liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ kvm
kvm        kvm-ok     kvm-spice  
liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ kvm-ok 
INFO: /dev/kvm exists
KVM acceleration can be used

②、安装kvm

更新系统环境

liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ sudo apt-get update -y
...更新输出

安装依赖包

liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ sudo apt-get install -y qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils virt-manager

将当前用户加入kvm所属组

liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ sudo usermod -aG libvirt $USER
liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ sudo usermod -aG kvm $USER
...
liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ cat /etc/group | grep kvm
kvm:x:108:liyi
liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ cat /etc/group | grep libv
libvirt:x:136:liyi
libvirt-qemu:x:64055:libvirt-qemu
libvirt-dnsmasq:x:137:

可以查看到当前用户已经被加入到了kvm组

③、启动kvm服务

liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ systemctl start libvirtd.service
liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ systemctl status libvirtd.service 
● libvirtd.service - Virtualization daemon
     Loaded: loaded (/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)
     Active: active (running) since Sat 2024-03-09 09:45:11 CST; 57min ago
TriggeredBy: ● libvirtd-ro.socket
             ● libvirtd-admin.socket
             ● libvirtd.socket
       Docs: man:libvirtd(8)
             https://libvirt.org
   Main PID: 982 (libvirtd)
      Tasks: 26 (limit: 32768)
     Memory: 44.3M
     CGroup: /system.slice/libvirtd.service
             ├─ 982 /usr/sbin/libvirtd
             ├─1322 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/br2.conf --leasefile-ro -->
             ├─1323 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/br2.conf --leasefile-ro -->
             ├─1358 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/br0.conf --leasefile-ro -->
             ├─1359 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/br0.conf --leasefile-ro -->
             ├─1392 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/br1.conf --leasefile-ro -->
             ├─1393 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/br1.conf --leasefile-ro -->
             ├─1424 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-r>
             └─1425 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-r>

3月 09 10:26:27 liyi-RESCUER-R720-lenovo dnsmasq-dhcp[1424]: DHCPREQUEST(virbr0) 192.168.122.119 52:5>
3月 09 10:26:27 liyi-RESCUER-R720-lenovo dnsmasq-dhcp[1424]: DHCPACK(virbr0) 192.168.122.119 52:54:00>
3月 09 10:34:08 liyi-RESCUER-R720-lenovo dnsmasq-dhcp[1424]: DHCPREQUEST(virbr0) 192.168.122.85 52:54>
3月 09 10:34:08 liyi-RESCUER-R720-lenovo dnsmasq-dhcp[1424]: DHCPACK(virbr0) 192.168.122.85 52:54:00:>
liyi@liyi-RESCUER-R720-lenovo:~/Desktop$

3、使用kvm配置脚本创建虚拟机(需要下载相关的centos操作系统ISO镜像文件…此处省略部分操作)

kvm创建虚拟机

虚拟机环境和IP地址

主机名	IP	安装环境	备注
pg01	192.168.122.77	postgresql14，pgbouncer，patroni leader	主，yum安装pg，yum安装pgbouncer，yum安装patroni
pg01	192.168.122.85	postgresql14，patroni replica	备，yum安装pgr，yum安装patroni
pg01	192.168.122.102	postgresql14，patroni replica	备，yum安装pg，yum安装patroni
etcd1	192.168.122.225	etcd 3.3.11	leader，yum安装etcd
etcd2	192.168.122.138	etcd 3.3.11	follower，yum安装etcd
etcd3	192.168.122.119	etcd 3.3.11	follower，yum安装etcd
ha_keep1	192.168.122.53，浮动IP：192.168.122.3	haproxy，keepalived	yum安装
ha_keep2	192.168.122.54，浮动IP：192.168.122.3	haproxy，keepalived	yum安装

4、搭建postgresql环境

①、3台pg虚拟机安装postgresql，名称分别为:pg01、pg02、pg03

主机名	IP	安装环境	备注
pg01	192.168.122.77	postgresql14	主，yum安装pg
pg01	192.168.122.85	postgresql14	备，yum安装pg
pg01	192.168.122.102	postgresql14	备，yum安装pg

具体操作和安装postgresql版本请参考postgresql官网: https://www.postgresql.org/download/linux/redhat/
这里我选择postgre版本14:

[root@pg01 ~]# yum makecache -y
[root@pg01 ~]# yum update -y
[root@pg01 ~]# sudo yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
[root@pg01 ~]# sudo yum install -y postgresql14-server
[root@pg01 ~]# sudo /usr/pgsql-14/bin/postgresql-14-setup initdb
[root@pg01 ~]# sudo systemctl start postgresql-14
[root@pg01 ~]# sudo systemctl enable postgresql-14

查看postgresql服务状态

[root@pg01 ~]# systemctl status postgresql-14
● postgresql-14.service - PostgreSQL 14 database server
   Loaded: loaded (/usr/lib/systemd/system/postgresql-14.service; enabled; vendor preset: disabled)
   Active: active (running) since 五 2024-03-08 20:50:47 EST; 1h 8min ago
     Docs: https://www.postgresql.org/docs/14/static/
  Process: 1022 ExecStartPre=/usr/pgsql-14/bin/postgresql-14-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)
 Main PID: 1049 (postmaster)
   CGroup: /system.slice/postgresql-14.service
           ├─1049 /usr/pgsql-14/bin/postmaster -D /var/lib/pgsql/14/data/
           ├─1416 postgres: logger 
           ├─1536 postgres: checkpointer 
           ├─1537 postgres: background writer 
           ├─1538 postgres: walwriter 
           ├─1539 postgres: autovacuum launcher 
           ├─1540 postgres: stats collector 
           └─1541 postgres: logical replication launcher 

3月 08 20:50:38 pg01 systemd[1]: Starting PostgreSQL 14 database server...
3月 08 20:50:45 pg01 postmaster[1049]: 2024-03-08 20:50:45.329 EST [1049] LOG:  redirecting l...ess
3月 08 20:50:45 pg01 postmaster[1049]: 2024-03-08 20:50:45.329 EST [1049] HINT:  Future log o...g".
3月 08 20:50:47 pg01 systemd[1]: Started PostgreSQL 14 database server.
Hint: Some lines were ellipsized, use -l to show in full.
[root@pg01 ~]#

pg02、pg03同理进行postgresql安装…

②、开启postgresql 访问端口

telnet检查3台postgresql5432端口是否可以访问
firewalld开放5432端口

[root@pg01 ~]# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
   Active: active (running) since 五 2024-03-08 22:55:32 EST; 1s ago
     Docs: man:firewalld(1)
 Main PID: 1953 (firewalld)
   CGroup: /system.slice/firewalld.service
           └─1953 /usr/bin/python2 -Es /usr/sbin/firewalld --nofork --nopid

3月 08 22:55:32 pg01 systemd[1]: Starting firewalld - dynamic firewall daemon...
3月 08 22:55:32 pg01 systemd[1]: Started firewalld - dynamic firewall daemon.
3月 08 22:55:32 pg01 firewalld[1953]: WARNING: AllowZoneDrifting is enabled. This is conside...now.
Hint: Some lines were ellipsized, use -l to show in full.
[root@pg01 ~]# firewall-cmd --zone=public --add-port=5432/tcp --permanent
success
[root@pg01 ~]# firewall-cmd --list-all
public (active)
  target: default
  icmp-block-inversion: no
  interfaces: eth0
  sources: 
  services: dhcpv6-client ssh
  ports: 5432/tcp
  protocols: 
  masquerade: no
  forward-ports: 
  source-ports: 
  icmp-blocks: 
  rich rules:

配置postgresql pg_hba允许pg02、pg03服务器连接，配置postgresql.conf监听所有端口[*]

[root@pg01 ~]# vim /var/lib/pgsql/14/data/pg_hba.conf
# IPv4 local connections:
# server connected
host    all             all             192.168.122.77/32       md5
host    all             all             192.168.122.85/32       md5
host    all             all             192.168.122.102/32      md5
host    all             all             127.0.0.1/32            scram-sha-256
#host    all             all             0.0.0.0/0               md5

[root@pg01 ~]# vim /var/lib/pgsql/14/data/postgresql.conf
#listen_addresses = 'localhost'         # what IP address(es) to listen on;
listen_addresses = '*'

重启postgresql服务

[root@pg01 ~]# systemctl restart postgresql-14

telnet测试5432端口

[root@pg01 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.122.77 pg01
192.168.122.85 pg02
192.168.122.102 pg03
[root@pg01 ~]# telnet pg01
Trying 192.168.122.77...
telnet: connect to address 192.168.122.77: Connection refused
[root@pg01 ~]# telnet pg01 5432
Trying 192.168.122.77...
Connected to pg01.
Escape character is '^]'.
^CConnection closed by foreign host.
[root@pg01 ~]# telnet pg02 5432
Trying 192.168.122.85...
Connected to pg02.
Escape character is '^]'.
^CConnection closed by foreign host.
[root@pg01 ~]# telnet pg03 5432
Trying 192.168.122.102...
Connected to pg03.
Escape character is '^]'.
^CConnection closed by foreign host.

postgresql端口访问正常。

③、配置主库

配置主服务器postgresql.conf，开启archive设置archive path

[root@pg01 ~]# vim /var/lib/pgsql/14/data/postgresql.conf
206 wal_level = replica                     # minimal, replica, or logical
211 synchronous_commit = off                # synchronization level;
246 archive_mode = on              # enables archiving; off, on, or alw    ays
248 archive_command = 'cp %p /var/lib/pgsql/14/backups/%f'          # command to use to archive a logfile segment
299 max_wal_senders = 10            # max number of walsender processes

创建用于复制的用户

[root@pg01 ~]# su - postgres
Last login: 五 3月  8 22:45:30 EST 2024 on pts/0
-bash-4.2$ psql
postgres=# create role repli_user login encrypted password 'repli_password' replication;
postgres=# select usename,usesysid,passwd from pg_user;
  usename   | usesysid |  passwd  
------------+----------+----------
 postgres   |       10 | ********
 repli_user |    16393 | ********
(2 rows)
postgres=# \q

更改pg01主库的pg_hba文件，将之前配置的trust 的允许所有(all)更改为允许新创建的复制用户，以准许repli_user复制

[root@pg01 ~]# vim /var/lib/pgsql/14/data/pg_hba.conf
# server connected
host    replication             repli_user             192.168.122.77/32       md5
host    replication             repli_user             192.168.122.85/32       md5
host    replication             repli_user             192.168.122.102/32      md5
host    all             all             127.0.0.1/32            scram-sha-256
host    all             all             0.0.0.0/0      md5

[root@pg01 ~]# systemctl restart postgresql-14

配置主库postgres用户目录下.pgpass用户名密码文件并设置文件模式为400

-bash-4.2$ pwd
/var/lib/pgsql
-bash-4.2$ cat .pgpass 
192.168.122.77:5432:replication:repli_user:repli_password

④、配置从库

配置从库postgres用户目录下.pgpass用户名密码文件并设置文件模式为400

-bash-4.2$ cat .pgpass 
192.168.122.77:5432:replication:repli_user:repli_password

使用backup将主库archive到从库backups目录，以检查从库连接主库

-bash-4.2$ ls -lh 14/backups/
total 0
-bash-4.2$ pg_basebackup -F p -D 14/backups -h 192.168.122.77 -p 5432 -U repli_user -F p -v -R
-bash-4.2$ ls -lh 14/backups/
total 200K
-rw-------. 1 postgres postgres  225 Mar  9 03:42 backup_label
-rw-------. 1 postgres postgres 135K Mar  9 03:42 backup_manifest
drwx------. 5 postgres postgres   41 Mar  9 03:42 base
......

配置backups/postgresql.conf 添加primary_conninfo路径

primary_conninfo = 'host=192.168.122.77 port=5432 user=repli_user password=repli_password'

清空从库的data目录，把backups下所有同步数据复制到从库的data目录

[root@pg02 ~]# systemctl stop postgresql-14
[root@pg02 ~]# su - postgres
-bash-4.2$ rm -rf /var/lib/pgsql/14/data/*
-bash-4.2$ cp -r 14/backups/* /var/lib/pgsql/14/data/
-bash-4.2$ exit
logout
[root@pg02 ~]# 
[root@pg02 ~]# systemctl start postgresql-14

从库服务重启正常，再切换到主库的数据库查询窗口执行replication 日志发送状态查询

postgres=# select * from pg_stat_replication;
*
(1 row)

从库查询日志接收状态

postgres=# select * from pg_stat_wal_receiver ;
*
(1 row)

pg03从服务器配置依旧如上
从pg01主服务器查看pg_stat_replication表

postgres=# select pid,usesysid,usename,application_name,client_addr,client_hostname,client_port,backend_start from pg_stat_replication;
 pid  | usesysid |  usename   | application_name |  client_addr   | client_hostname | client_port |         backend_start         
------+----------+------------+------------------+----------------+-----------------+-------------+-------------------------------
 4279 |    16393 | repli_user | walreceiver      | 192.168.122.85 |                 |       59074 | 2024-03-09 05:20:55.374547-05
(1 row)

postgres=# 
postgres=# select pid,usesysid,usename,application_name,client_addr,client_hostname,client_port,backend_start from pg_stat_replication;
 pid  | usesysid |  usename   | application_name |   client_addr   | client_hostname | client_port |         backend_start         
------+----------+------------+------------------+-----------------+-----------------+-------------+-------------------------------
 4353 |    16393 | repli_user | walreceiver      | 192.168.122.85  |                 |       59076 | 2024-03-09 05:43:10.798228-05
 4359 |    16393 | repli_user | walreceiver      | 192.168.122.102 |                 |       39832 | 2024-03-09 05:45:26.745926-05
(2 rows)

⑤、主服务器(pg01)创建test table查看从服务器同步状况

create test table

5、搭建etcd环境

①、3台etcd虚拟机安装etcd服务:etcd1、etcd2、etcd3

主机名	IP	安装环境	备注
etcd1	192.168.122.225	etcd 3.3.11	leader，yum安装etcd
etcd2	192.168.122.138	etcd 3.3.11	follower，yum安装etcd
etcd3	192.168.122.119	etcd 3.3.11	follower，yum安装etcd

[root@etcd1 ~]# yum makecache -y
Loaded plugins: fastestmirror
Determining fastest mirrors
 * base: mirrors.bfsu.edu.cn
 * extras: mirrors.bfsu.edu.cn
 * updates: mirrors.qlu.edu.cn
base                                                                         | 3.6 kB  00:00:00     
extras                                                                       | 2.9 kB  00:00:00     
updates                                                                      | 2.9 kB  00:00:00     
(1/10): base/7/x86_64/group_gz                                               | 153 kB  00:00:00     
(2/10): extras/7/x86_64/filelists_db                                         | 303 kB  00:00:00     
(3/10): extras/7/x86_64/other_db                                             | 150 kB  00:00:00     
(4/10): base/7/x86_64/other_db                                               | 2.6 MB  00:00:01     
(5/10): extras/7/x86_64/primary_db                                           | 250 kB  00:00:01     
(6/10): updates/7/x86_64/other_db                                            | 1.5 MB  00:00:03     
(7/10): updates/7/x86_64/filelists_db                                        |  14 MB  00:00:05     
(8/10): base/7/x86_64/filelists_db                                           | 7.2 MB  00:00:08     
(9/10): base/7/x86_64/primary_db                                             | 6.1 MB  00:00:09     
(10/10): updates/7/x86_64/primary_db                                         |  25 MB  00:00:12     
Metadata Cache Created

安装etcd：3.3.11版本

[root@etcd1 ~]# yum install etcd -y
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirrors.bfsu.edu.cn
 * extras: mirrors.bfsu.edu.cn
 * updates: mirrors.qlu.edu.cn
Resolving Dependencies
--> Running transaction check
---> Package etcd.x86_64 0:3.3.11-2.el7.centos will be installed
--> Finished Dependency Resolution
......
[root@etcd1 ~]# etcd --version
etcd Version: 3.3.11
Git SHA: 2cf9e51
Go Version: go1.10.3
Go OS/Arch: linux/amd64

②、配置每个节点etcd.conf

etcd configure
后面出现了一些问题，因为配置内的TOKEN设置是纯字母加下划线，重启服务会告知TOKEN过于简单，后面改成了1hretcd-cluster 四分之三原则

③、使用etcdctl member list查看集群状态

etcd集群状态查看

④、集群健康状态检查

[root@etcd1 ~]# sed -i 's/ETCD_INITIAL_CLUSTER_STATE="new"/ETCD_INITIAL_CLUSTER_STATE="existing"/g' /etc/etcd/etcd.conf | grep ETCD_INITIAL_CLUSTER_STATE
[root@etcd1 ~]# cat /etc/etcd/etcd.conf | grep ETCD_INITIAL_CLUSTER_STATE
ETCD_INITIAL_CLUSTER_STATE="existing"

firewall-cmd开放etcd集群每个成员主机的2379端口，否则会导致etcd集群降级

[root@etcd1 ~]# etcdctl cluster-health
member c6cde425fdd57d1 is healthy: got healthy result from http://192.168.122.225:2379
failed to check the health of member 92c68a626a26e458 on http://192.168.122.119:2379: Get http://192.168.122.119:2379/health: dial tcp 192.168.122.119:2379: i/o timeout
member 92c68a626a26e458 is unreachable: [http://192.168.122.119:2379] are all unreachable
failed to check the health of member cc4e6b745b5521d9 on http://192.168.122.138:2379: Get http://192.168.122.138:2379/health: dial tcp 192.168.122.138:2379: i/o timeout
member cc4e6b745b5521d9 is unreachable: [http://192.168.122.138:2379] are all unreachable
cluster is degraded
[root@etcd1 ~]# firewall-cmd --add-port=2379/tcp
success
[root@etcd1 ~]# 
[root@etcd1 ~]# etcdctl cluster-health
member c6cde425fdd57d1 is healthy: got healthy result from http://192.168.122.225:2379
member 92c68a626a26e458 is healthy: got healthy result from http://192.168.122.119:2379
member cc4e6b745b5521d9 is healthy: got healthy result from http://192.168.122.138:2379
cluster is healthy

6、搭建patroni

主机名	IP	安装环境	备注
pg01	192.168.122.77	postgresql14，patroni leader	主，yum安装pg，yum安装patroni
pg01	192.168.122.85	postgresql14，patroni replica	备，yum安装pg，yum安装patroni
pg01	192.168.122.102	postgresql14，patroni replica	备，yum安装pg，yum安装patroni

①、在pg01、pg02、pg03三台机器上安装python3环境和patroni

yum install python3 python3-pip -y
pip3 install --upgrade pip
pip install psycopg2-binary
pip install patroni
patroni --help  #正常输出

[root@pg01 ~]# pip list
Package            Version
------------------ -----------
click              8.0.4
dnspython          2.2.1
importlib-metadata 4.8.3
patroni            3.2.2
pip                21.3.1
prettytable        2.5.0
psutil             5.9.8
psycopg2-binary    2.9.5
python-dateutil    2.9.0.post0
python-etcd        0.4.5
PyYAML             6.0.1
setuptools         59.6.0
six                1.16.0
typing_extensions  4.1.1
urllib3            1.26.18
wcwidth            0.2.13
ydiff              1.2
zipp               3.6.0

创建patroni参数

[root@pg01 ~]# mkdir -p /app/patroni/etc/
[root@pg01 ~]# mkdir -p /app/patroni/log/
[root@pg01 ~]# chown postgres.postgres -R /app/patroni/
[root@pg01 ~]# cat > /app/patroni/etc/patroni_config.yml <<"EOF"
scope: pg_cluster
namespace: /service
name: pg01

log:
  level: INFO
  traceback_level: ERROR
  dir: /app/patroni/log
  file_num: 10
  file_size: 104857600

restapi:
  listen: 0.0.0.0:8008
  connect_address: 192.168.122.77:8008

etcd:
  host: 192.168.122.225:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    max_timelines_history: 0
    master_start_timeout: 300
    master_stop_timeout: 0
    synchronous_mode: false
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        max_connections: 3000
        superuser_reserved_connections: 100
        max_locks_per_transaction: 64
        max_worker_processes: 2
        max_prepared_transactions: 0
        wal_level: logical
        wal_log_hints: on
        track_commit_timestamp: off
        max_wal_senders: 10
        max_replication_slots: 10
        listen_addresses: "*"
        port: 5432
        cluster_name: "pg_cluster"
        archive_mode: on
        archive_command: "cp %p /var/lib/pgsql/14/backups/%f"

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 192.168.122.77:5432
  data_dir: /var/lib/pgsql/14/data
  bin_dir: /usr/pgsql-14/bin
  pgpass: /var/lib/pgsql/.pgpass
  pg_ctl_timeout: 60
  use_pg_rewind: true
  remove_data_directory_on_rewind_failure: false
  remove_data_directory_on_diverged_timelines: true
  authentication:
    replication:
      username: repli_user
      password: repli_password
    superuser:
      username: postgres
      password: postgres

tags:
  nofailover: false
  noloadbalance: false
  clonefrom: false
  nosync: false
EOF

[root@pg02 ~]# mkdir -p /app/patroni/etc/
[root@pg02 ~]# mkdir -p /app/patroni/log/
[root@pg02 ~]# chown postgres.postgres -R /app/patroni/
[root@pg02 ~]# cat > /app/patroni/etc/patroni_config.yml <<"EOF"
scope: pg_cluster
namespace: /service
name: pg02

log:
  level: INFO
  traceback_level: ERROR
  dir: /app/patroni/log
  file_num: 10
  file_size: 104857600

restapi:
  listen: 0.0.0.0:8008
  connect_address: 192.168.122.85:8008

etcd:
  host: 192.168.122.138:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    max_timelines_history: 0
    master_start_timeout: 300
    master_stop_timeout: 0
    synchronous_mode: false
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        max_connections: 3000
        superuser_reserved_connections: 100
        max_locks_per_transaction: 64
        max_worker_processes: 2
        max_prepared_transactions: 0
        wal_level: logical
        wal_log_hints: on
        track_commit_timestamp: off
        max_wal_senders: 10
        max_replication_slots: 10
        listen_addresses: "*"
        port: 5432
        cluster_name: "pg_cluster"
        archive_mode: on
        archive_command: "cp %p /var/lib/pgsql/14/backups/%f"

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 192.168.122.85:5432
  data_dir: /var/lib/pgsql/14/data
  bin_dir: /usr/pgsql-14/bin
  pgpass: /var/lib/pgsql/.pgpass
  pg_ctl_timeout: 60
  use_pg_rewind: true
  remove_data_directory_on_rewind_failure: false
  remove_data_directory_on_diverged_timelines: true
  authentication:
    replication:
      username: repli_user
      password: repli_password
    superuser:
      username: postgres
      password: postgres

tags:
  nofailover: false
  noloadbalance: false
  clonefrom: false
  nosync: false
EOF

[root@pg03 ~]# mkdir -p /app/patroni/etc/
[root@pg03 ~]# mkdir -p /app/patroni/log/
[root@pg03 ~]# chown postgres.postgres -R /app/patroni/
[root@pg03 ~]# cat > /app/patroni/etc/patroni_config.yml <<"EOF"
scope: pg_cluster
namespace: /service
name: pg03

log:
  level: INFO
  traceback_level: ERROR
  dir: /app/patroni/log
  file_num: 10
  file_size: 104857600

restapi:
  listen: 0.0.0.0:8008
  connect_address: 192.168.122.102:8008

etcd:
  host: 192.168.122.119:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    max_timelines_history: 0
    master_start_timeout: 300
    master_stop_timeout: 0
    synchronous_mode: false
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        max_connections: 3000
        superuser_reserved_connections: 100
        max_locks_per_transaction: 64
        max_worker_processes: 2
        max_prepared_transactions: 0
        wal_level: logical
        wal_log_hints: on
        track_commit_timestamp: off
        max_wal_senders: 10
        max_replication_slots: 10
        listen_addresses: "*"
        port: 5432
        cluster_name: "pg_cluster"
        archive_mode: on
        archive_command: "cp %p /var/lib/pgsql/14/backups/%f"

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 192.168.122.102:5432
  data_dir: /var/lib/pgsql/14/data
  bin_dir: /usr/pgsql-14/bin
  pgpass: /var/lib/pgsql/.pgpass
  pg_ctl_timeout: 60
  use_pg_rewind: true
  remove_data_directory_on_rewind_failure: false
  remove_data_directory_on_diverged_timelines: true
  authentication:
    replication:
      username: repli_user
      password: repli_password
    superuser:
      username: postgres
      password: postgres

tags:
  nofailover: false
  noloadbalance: false
  clonefrom: false
  nosync: false
EOF

在运行之前需要给patroni用到的super用户(postgres)开放本地登录，配置pg_hba

-bash-4.2$ vim 14/data/pg_hba.conf
# server connected
host    replication             repli_user             192.168.122.77/32       md5
host    replication             repli_user             192.168.122.85/32       md5
host    replication             repli_user             192.168.122.102/32      md5
host    all             all             127.0.0.1/32            scram-sha-256
host    all             postgres             127.0.0.1/32            md5

测试patroni

-bash-4.2$ patroni /app/patroni/etc/patroni_config.yml

查看设置的patroni log信息

-bash-4.2$ tail -f /app/patroni/log/patroni.log 
2024-03-09 19:23:06,721 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:23:16,797 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:23:26,790 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:23:36,901 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:23:46,835 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:23:57,068 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:24:06,688 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:24:16,926 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:24:26,906 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:24:36,882 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:24:46,828 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:24:56,748 INFO: no action. I am (pg01), the leader with the lock

另外两台log显示

2024-03-09 19:33:12,764 INFO: Lock owner: pg01; I am pg02
2024-03-09 19:33:12,849 INFO: failed to start postgres

2024-03-09 19:33:32,809 INFO: Lock owner: pg01; I am pg03
2024-03-09 19:33:32,888 INFO: failed to start postgres

新建patroni.service服务

cat > /usr/lib/systemd/system/patroni.service <<"EOF"
[Unit]
Description=patroni
After=network.target remote-fs.target nss-lookup.target

[Service]
Type=forking
User=postgres
Group=postgres
Environment="PGHOME=/usr/pgsql-14"
Environment="PGDATA=/var/lib/pgsql/14/data"
Environment="PGPORT=5432"
Environment="LD_LIBRARY_PATH=/usr/pgsql-14/lib"
Environment="PATH=/usr/pgsql-14/bin:/usr/local/bin:/usr/bin"
ExecStart=/bin/bash -c "/usr/local/bin/patroni /app/patroni/etc/patroni_config.yml >> /app/patroni/log/patroni.log 2>&1 &"
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/usr/bin/kill -9 patroni
KillMode=process
TimeoutSec=30
Restart=no

[Install]
WantedBy=multi-user.target
EOF

检查patroni集群状态

-bash-4.2$ patronictl -c /app/patroni/etc/patroni_config.yml list
+ Cluster: pg_cluster (7344240387828606601) ---+----+-----------+
| Member | Host            | Role    | State   | TL | Lag in MB |
+--------+-----------------+---------+---------+----+-----------+
| pg01   | 192.168.122.77  | Leader  | running |  7 |           |
| pg02   | 192.168.122.85  | Replica | running |  4 |        48 |
| pg03   | 192.168.122.102 | Replica | running |  7 |        16 |
+--------+-----------------+---------+---------+----+-----------+
-bash-4.2$

如果replica主机state状态为start failed就需要查看是否是该节点同步的postgresql .partial有如果有问题，需要把backup目录删除重新执行backup命令同步状态

pg_basebackup -F p -D 14/backups -h 192.168.122.77 -p 5432 -U repli_user -F p -v -R

7、主节点安装pgbouncer

yum install pgbouncer -y
vim /etc/pgbouncer/pgbouncer.ini
[databases]
my_test_db = host=localhost port=5432 dbname=my_test_db user=postgres  #注意db配置
[pgbouncer]
logfile = /var/log/pgbouncer/pgbouncer.log
pidfile = /var/run/pgbouncer/pgbouncer.pid
listen_addr = *
listen_port = 6432
auth_type = md5    #需要和pg_hba.conf中的 trust、MD5保持一致
auth_file = /etc/pgbouncer/userlist.txt
admin_users = postgres
stats_users = postgres

vim /etc/pgbouncer/userlist.txt
"postgres" "SCRAM**"
systemctl restart pgbouncer

登录测试

-bash-4.2$ psql -h 127.0.0.1 -p 6432 -d my_test_db
Password for user postgres: 
psql (14.11)
Type "help" for help.

my_test_db=# \d pg_user;
                        View "pg_catalog.pg_user"
    Column    |           Type           | Collation | Nullable | Default 
--------------+--------------------------+-----------+----------+---------
 usename      | name                     |           |          | 
 usesysid     | oid                      |           |          | 
 usecreatedb  | boolean                  |           |          | 
 usesuper     | boolean                  |           |          | 
 userepl      | boolean                  |           |          | 
 usebypassrls | boolean                  |           |          | 
 passwd       | text                     |           |          | 
 valuntil     | timestamp with time zone |           |          | 
 useconfig    | text[]                   | C         |          | 

my_test_db=#

8、搭建haproxy、keepalived

主机名	IP	安装环境	备注
ha_keep1	192.168.122.53	haproxy，keepalived	yum安装
ha_keep2	192.168.122.54	haproxy，keepalived	yum安装

yum makecache
yum update -y
yum install -y vim haproxy
setenforce 0 #会影响haproxy 服务的端口绑定

增加/etc/hosts解析

192.168.122.77 pg01
192.168.122.85 pg02
192.168.122.102 pg03

编辑haproxy配置文件

主库postgresql 的pgbouncer端口6432
另外两个从库是5432

vim /etc/haproxy/haproxy.cfg

global
    log         127.0.0.1 local2       # 
    chroot      /var/lib/haproxy       # 
    pidfile     /var/run/haproxy.pid   # 
    maxconn     3000                   # 
    user        haproxy                # 
    group       haproxy                # 
    daemon                             #
    nbproc 1                           # 
    stats socket /var/lib/haproxy/stats

defaults
    mode tcp                           # mode {tcp|http|health}
    log 127.0.0.1 local2 err           # use syslog
    option     tcplog                  # 
    option     dontlognull             # 
    option     redispatch              # 
    option     abortonclose            # 
    retries    3                       # 
    maxconn    3000                    # 
    timeout queue           1m         # 
    timeout connect         10s        # 
    timeout client          1m         # 
    timeout server          1m         # 
    timeout check           5s         # 

listen status
    bind *:1080                        # 
    mode http
    log global
    stats enable                       # 
    stats refresh 30s                  # 
    maxconn 10                         # 
    stats uri /                        # http//ip:1080/
    stats realm Private lands          # 
    stats auth admin:Admin2023         # user password
    stats hide-version

listen master
    bind *:5000
    mode tcp
    option tcplog
    balance roundrobin                 # 轮询
    option httpchk OPTIONS /master
    http-check expect status 200
    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
    server pg01 192.168.122.77:6432  maxconn 1000 check port 8008 inter 5000 rise 2 fall 2
    server pg02 192.168.122.85:5432  maxconn 1000 check port 8008 inter 5000 rise 2 fall 2
    server pg03 192.168.122.102:5432  maxconn 1000 check port 8008 inter 5000 rise 2 fall 2

setenforce 0
systemctl start haproxy
firewall-cmd --add-port=1080/tcp  #开放绑定端口的访问 http

可以通过http://192.168.122.53:1080/ 和 http://192.168.122.54:1080/ 进行haproxy访问
haproxy

安装keepalived

给53 和 54 统一安装keepalived

yum install keepalived -y

配置53为主节点master 增加浮动IP地址

[root@ha_keep1 ~]# vim /etc/keepalived/keepalived.conf
! Configuration File for keepalived

global_defs {
   router_id pgservice_router
}

vrrp_script chk_haproxy {
    script "/usr/bin/killall -0 haproxy"
    interval 2
    weight 5
    fall 3
    rise 5
    timeout 2
}
vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 53
    priority 200
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        192.168.122.3
    }
    track_script {
        chk_haproxy
    }
}
[root@ha_keep1 ~]# systemctl restart keepalived
[root@ha_keep1 ~]# systemctl enable keepalived
[root@ha_keep1 ~]# systemctl status keepalived
[root@ha_keep1 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:00:a1:49 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.53/24 brd 192.168.122.255 scope global noprefixroute dynamic eth0
       valid_lft 3128sec preferred_lft 3128sec
    inet 192.168.122.3/32 scope global eth0  #浮动地址已经出现
       valid_lft forever preferred_lft forever
    inet6 fe80::ee78:b71e:23c0:7fc4/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

配置54为备节点backup (因为是备节点，目前浮动地址还挂在主节点53上)

[root@ha_keep2 ~]# vim /etc/keepalived/keepalived.conf
! Configuration File for keepalived

global_defs {
   router_id pgservice_router
}

vrrp_script chk_haproxy {
    script "/usr/bin/killall -0 haproxy"
    interval 2
    weight 5
    fall 3
    rise 5
    timeout 2
}
vrrp_instance VI_1 {
    state BACKUP  #更改为备节点
    interface eth0
    virtual_router_id 53
    priority 100  #降低优先级
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        192.168.122.3
    }
    track_script {
        chk_haproxy
    }
}
[root@ha_keep2 ~]# systemctl restart keepalived
[root@ha_keep2 ~]# systemctl enable keepalived
[root@ha_keep2 ~]# systemctl status keepalived
[root@ha_keep2 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:a2:85:b8 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.54/24 brd 192.168.122.255 scope global noprefixroute dynamic eth0
       valid_lft 3572sec preferred_lft 3572sec
    inet6 fe80::df8b:cdfb:69d3:8f24/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever