Linux|centos7-postgresql数据库|yum安装数据库和配置repmgr高可用集群以及repmgr的日常管理工作

news2025/1/13 7:23:40

一、

前言

postgresql 的yum部署其实还是有点东西的,本文就做一个小小的记录,高可用方面repmgr插件还是非常不错的,但如何部署以及部署后如何使用也是一个难点,因此,也在本文里做一个记录

环境介绍:

第一台服务器:操作系统是centos7,内核版本是3.10,IP地址是192.168.123.17  VMware虚拟机,内存4G,cpu4核,计划在此服务器上安装postgresql12.4版本以及repmgr高可用插件。该服务器是主节点

第二台服务器:操作系统是centos7,内核版本是3.10,IP地址是192.168.123.20  VMware虚拟机,内存4G,cpu4核,计划在此服务器上安装postgresql12.4版本以及repmgr高可用插件。该服务器是从节点

二、

详细的yum安装postgreslq数据库

1,配置yum源

yum安装postgresql比较简单,源配置官方源就可用了,命令如下(2台服务器都执行):

yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm

为了防止某些库安装的时候提示缺少依赖,增加阿里云的库和阿里云的epel库

wget -O /etc/yum.repos.d/epel.repo https://mirrors.aliyun.com/repo/epel-7.repo
wget -O /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo

 增加不常用的rpm包依赖仓库:

[centos-sclo-sclo]
name=CentOS-7 - SCLo sclo
baseurl=https://mirrors.tuna.tsinghua.edu.cn/centos/7/sclo/$basearch/sclo/
gpgcheck=0
enabled=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-SCLo
 
[centos-sclo-rh]
name=CentOS-7 - SCLo rh
baseurl=https://mirrors.tuna.tsinghua.edu.cn/centos/7/sclo/$basearch/rh/
gpgcheck=0
enabled=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-SCLo

2,

安装postgresql的组件和postgresql的server服务(2台服务器都执行)

组件比较多,主要是lib库(postgresql运行的依赖库),插件扩展库(postgresql的插件库),开发库(一些postgresql的特殊组件需要使用的库),连接库(开发微服务使用的库)

后面的几个看自己的需求安装不,比如postgresql12-plperl

版本号根据自己需求来,我这里是12.4, 不想安这个低版本的,修改12.4就可以了,比如全部的12.4修改为12.8

yum install postgresql12-server-12.4 postgresql12-libs-12.4 postgresql12-devel-12.4 postgresql12-contrib-12.4 postgresql12-tcl postgresql12-odbc postgresql12-12.4  postgresql12-plperl-12.4 postgresql12-pltcl-12.4 postgresql12-plpython-12.4  postgresql12-plpython3-12.4 postgresql12-llvmjit-12.4 postgresql12-devel-12.4  -y

3,

postgresql的环境变量定义和一些特殊处理(两台服务器都执行)

在/etc/profile 文件末尾添加如下变量,计划postgresql安装在/data/pgsql/data目录下

PGDATA=/data/pgsql/data
export PGDATA
PGHOME=/data/pgsql
export PGHOME
PATH=$PATH:$PGHOME/bin:$PGDATA:/usr/pgsql-12/bin/
export PATH  PGHOME PGDATA

写入完成后,激活变量:

source /etc/profile

切换用户到普通用户postgres,将该用户的环境变量文件全部注释掉,结果如下:

-bash-4.2$ whoami
postgres
-bash-4.2$ cat ~/.bash_profile 
#[ -f /etc/profile ] && source /etc/profile
#PGDATA=/var/lib/pgsql/12/data
#export PGDATA
# If you want to customize your settings,
# Use the file below. This is not overridden
# by the RPMS.
#[ -f /var/lib/pgsql/.pgsql_profile ] && source /var/lib/pgsql/.pgsql_profile

注释的原因是我们不希望使用默认的/var/lib/12目录存放数据库的数据文件,注销普通用户再次su - postgres 重新激活变量后,在普通用户下,可以看到postgresql数据库相关变量已经固定了:

[root@centos10 media]# su - postgres
Last login: Sat Mar 30 05:19:48 CST 2024 on pts/0
-bash-4.2$ echo $PGDATA
/data/pgsql/data

4,创建相关目录并赋予普通用户的权限,准备初始化工作(root用户执行,两台服务器都执行):

mkdir -p /data/pgsql/
chown -Rf postgres. /data/pgsql

 

5,执行数据库初始化(仍然是切换到普通用户postgres):

由于前面的准备工作比较充分,环境变量已经完全配置好了,因此,执行initdb就可以完成初始化了,/data/pgsql/data/目录以及其下的文件将在初始化的时候创建好

初始化日志如下:

[root@centos7 media]# su - postgres
Last login: Sat Mar 30 05:25:23 CST 2024 on pts/0
-bash-4.2$ initdb
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory /data/pgsql/data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Asia/Shanghai
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    pg_ctl -D /data/pgsql/data -l logfile start

6,

在root用户下管理postgresql的启停

由于是yum安装的,因此,启停脚本是已经写好的,可以直接用,但我们数据库的数据文件存放路径修改了,因此,需要先修改一下启停脚本:

/usr/lib/systemd/system/postgresql-12.service

Environment=PGDATA=/var/lib/pgsql/12/data/
修改为
Environment=PGDATA=/data/pgsql/data/

这个启停脚本名字太长了,修改一下名字 就叫pg12吧:

mv /usr/lib/systemd/system/postgresql-12.service /usr/lib/systemd/system/pg12.service

启动postgresql数据库并查看进行确认是否启动成功:

systemctl enable pg12 && systemctl start pg12

查看postgreslq的进程:

[root@centos7 media]# ps aux |grep postgre
postgres 16161  0.2  0.4 397292 17424 ?        Ss   05:51   0:00 /usr/pgsql-12/bin/postmaster -D /data/pgsql/data/
postgres 16164  0.0  0.0 251676  2008 ?        Ss   05:51   0:00 postgres: logger   
postgres 16166  0.0  0.0 397292  2060 ?        Ss   05:51   0:00 postgres: checkpointer   
postgres 16167  0.0  0.0 397292  2288 ?        Ss   05:51   0:00 postgres: background writer   
postgres 16168  0.0  0.0 397292  2064 ?        Ss   05:51   0:00 postgres: walwriter   
postgres 16169  0.0  0.0 397844  3212 ?        Ss   05:51   0:00 postgres: autovacuum launcher   
postgres 16170  0.0  0.0 251672  2120 ?        Ss   05:51   0:00 postgres: stats collector   
postgres 16171  0.0  0.0 397844  2808 ?        Ss   05:51   0:00 postgres: logical replication launcher   
root     16173  0.0  0.0 112712   964 pts/0    S+   05:51   0:00 grep --color=auto postgre

7,

远程连接和postgres用户密码设置

修改主配置文件 /data/pgsql/data/postgresql.conf 

listen_addresses = '*'          
port = 15433                            
max_connections = 1000   
wal_level = logical

这里是设置数据库连接端口是15433 

修改安全连接文件pg_hba.conf:

末尾添加如下两行,这两行的意思是复制用户replication可以随意连接,但其它用户远程连接的时候需要密码,本地登录的时候不校验密码

host    replication     all             0.0.0.0/0                 trust
host    all             all             0.0.0.0/0           md5

8,

本地登录数据库设置密码

su - postgres -c "psql -p15433"

alter user postgres with password '123456';

此时,数据库算是完全安装完毕,连接端口15433,连接密码123456 后面就不在重复这些信息了

随意安装一个插件,看看lib库什么的是否正常:

[root@centos10 ~]# su - postgres -c "psql -p15433"
psql (12.4)
Type "help" for help.

postgres=# create extension 
adminpack           btree_gin           dblink              file_fdw            hstore_plperlu      insert_username     jsonb_plperl        jsonb_plpythonu     ltree_plpython3u    pg_buffercache      pgrowlocks          pg_visibility       plpython3u          postgres_fdw        tablefunc           unaccent
amcheck             btree_gist          dict_int            fuzzystrmatch       hstore_plpython2u   intagg              jsonb_plperlu       lo                  ltree_plpythonu     pgcrypto            pg_stat_statements  plperl              plpythonu           refint              tcn                 "uuid-ossp"
autoinc             citext              dict_xsyn           hstore              hstore_plpython3u   intarray            jsonb_plpython2u    ltree               moddatetime         pg_freespacemap     pgstattuple         plperlu             pltcl               seg                 tsm_system_rows     xml2
bloom               cube                earthdistance       hstore_plperl       hstore_plpythonu    isn                 jsonb_plpython3u    ltree_plpython2u    pageinspect         pg_prewarm          pg_trgm             plpython2u          pltclu              sslinfo             tsm_system_time     
postgres=# create extension pg_trgm ;
CREATE EXTENSION

二,

repmgr的基本安装部署和配置

repmgr的介绍:

Repmgr是2ndQuadrant(第二象限公司)开发的一款复制的开源工具套件,用于管理PostgreSQL服务器集群中的复制和故障转移。

最初,它主要是为了简化流副本的管理,后来发展成为一个完整的故障转移管理套件。它通过设置备用服务器,监视复制以及执行管理任务(如故障转移或手动切换操作)的工具,增强了PostgreSQL内置的热备份功能。

其中各组件功能如下:

No.1 Repmgrd 守护进程

它主动监视复制集群中的服务器并执行以下任务:

1)监控和记录集群复制性能;

2)通过检测主服务器故障并提升最合适的备用服务器来执行故障转移;

3)将有关群集中事件的通知提供给用户定义的脚本,该脚本可以执行诸如通过电子邮件发送警报等任务;

4)repmgrd 根据本地数据库角色不同,其功能也不同:

  • 主库:repmgrd仅监控本地数据库,负责自动恢复、同异步切换;

  • 备库:repmgrd监控本地数据库和主数据库,负责自动切换、复制槽删除。

No.2 Repmgr命令管理

用于执行管理任务的命令行工具,主要有以下方面作用:

1)设置备用服务器;

2)将备用服务器升级为主服务器;

3)切换主服务器和备用服务器;

4)显示复制群集中的服务器状态。

No.3 用户和元数据 为了有效地管理复制集群,repmgr提供专用数据库存储和管理有关repmgr集群服务的相关信息。

此模式在 部署repmgr服务时,由repmgr扩展自动创建,该扩展在初始化repmgr -administered集群(repmgr主寄存器) 的第一步中安装,包含以下对象:

  • 表:

    repmgr.events:记录感兴趣的事件;

    repmgr.nodes:复制群集中每个服务器的连接和状态信息 ;

    repmgr.monitoring_history:repmgrd写入的历史备用监视信息。

  • 视图

    repmgr.show_nodes:基于表repmgr.nodes,另外显示服务器上游节点的名称。

    repmgr.replication_status:启用repmgrd的监视时,显示每个备用数据库的当前监视状态。

    repmgr元数据模式可以存储在现有的数据库或在自己的专用数据库。

请注意,repmgr元数据模式不能驻留在不属于repmgr管理的复制集群的数据库服务器上。数据库用户必须可供repmgr访问此数据库并执行必要的更改。此用户不需要是超级用户,但是某些操作(如初始安装repmgr扩展)将需要超级用户连接(可以使用命令 行选项--superuser在需要时指定 )。

1、

计划repmgr安装方式为yum,安装命令如下:

yum install repmgr_12-5.4.1       repmgr_12-devel-5.4.1  repmgr_12-llvmjit-5.4.1  -y

2、

主数据库装载repmgr插件

修改主数据库17的主配置文件,data/pgsql/data/postgresql.conf,修改完毕后重启服务器

主要是定义数据库的端口,以及归档命令

listen_addresses = '*'
port = 15433
max_connections = 1000	
shared_buffers = 128MB	
dynamic_shared_memory_type = posix
wal_level = logical	
max_wal_size = 1GB
min_wal_size = 80MB
archive_mode = on	
archive_command = 'test ! -f /data/pgsql/arclog/%f && cp %p /data/pgsql/arclog/%f'		# 
log_destination = 'stderr'
logging_collector = on
log_directory = 'log'
log_filename = 'postgresql-%a.log'	
log_truncate_on_rotation = on		
log_rotation_age = 1d			
log_rotation_size = 0			
log_line_prefix = '%m [%p] '		
log_timezone = 'Asia/Shanghai'
datestyle = 'iso, mdy'
timezone = 'Asia/Shanghai'
lc_messages = 'en_US.UTF-8'							
lc_monetary = 'en_US.UTF-8'			
lc_numeric = 'en_US.UTF-8'			
lc_time = 'en_US.UTF-8'	
default_text_search_config = 'pg_catalog.english'
shared_preload_libraries = 'repmgr'
wal_log_hints = on					

DETAIL: pg_rewind requires "wal_log_hints" to be enabled 

根据以上配置,创建目录/data/pgsql/arclog并赋予postgres这个用户权限,最后重启服务器

mkdir -p /data/pgsql/arclog
chown -Rf postgres. /data/pgsql/arclog/
systemctl restart pg12
ntpdate ntp.aliyun.com

3、

主节点17repmgr的配置文件 /etc/repmgr/12/repmgr.conf 增加如下内容:

node_id=1
node_name='pg1'
conninfo='host=192.168.123.17 port=15433 user=repmgr dbname=repmgr password=123456'
data_directory='/data/pgsql/data'
replication_user='repmgr'
replication_type='physical'
#location='pg1'                                  
use_replication_slots=true
witness_sync_interval=15
log_level='DEBUG'
log_facility='STDERR'
log_file='/var/log/repmgr/pg2.log'
pg_bindir='/usr/pgsql-12/bin/'
ssh_options='-q -o ConnectTimeout=10'
promote_check_timeout=60
promote_check_interval=1
node_rejoin_timeout=60
failover='automatic'
#priority=100                                                   
connection_check_type=ping
promote_command='/usr/pgsql-12/bin/repmgr standby promote -f /etc/repmgr/12/repmgr.conf --log-to-file'
follow_command='/usr/pgsql-12/bin/repmgr standby follow -f /etc/repmgr/12/repmgr.conf --log-to-file --upstream-node-id=%n'
monitoring_history=yes
service_start_command = '/usr/bin/systemctl start pg12'
service_stop_command = '/usr/bin/systemctl stop pg12'
service_restart_command = '/usr/bin/systemctl restart pg12'
service_reload_command = '/usr/bin/systemctl reload pg12'
service_promote_command = '/usr/pgsql-12/bin/pg_ctl promote -D /data/pgsql/data'
repmgrd_service_start_command = '/usr/pgsql-12/bin/repmgrd -f /etc/repmgr/12/repmgr.conf -p /run/repmgr/repmgrd-12.pid -d --verbose'
repmgrd_service_stop_command = '/usr/bin/kill `cat /run/repmgr/repmgrd-12.pid`'

 

4、

根据以上配置文件,数据库添加repmgr用户

[root@centos7 ~]# su - postgres -c 'psql -p15433'
psql (12.4)
Type "help" for help.

postgres=# create user repmgr with password '123456';
CREATE ROLE
postgres=# create database repmgr;
CREATE DATABASE




ALTER USER repmgr WITH REPLICATION;

5、

切换普通用户,利用repmgr注册主节点

[root@centos7 ~]# su - postgres
Last login: Sat Mar 30 23:08:48 CST 2024 on pts/0
-bash-4.2$ repmgr primary register
INFO: connecting to primary database...
DEBUG: connecting to: "user=repmgr dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
ERROR: connection to database failed
DETAIL: 
fe_sendauth: no password supplied

DETAIL: attempted to connect using:
  user=repmgr dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path=

失败了,原因是没有传递密码变量,执行变量命令,并重新注册,注册的时候临时给repmgr用户超级权限,注册成功

export PGPASSWORD="123456"

-bash-4.2$ repmgr primary register -S postgres
WARNING: following problems with command line parameters detected:
  --superuser ignored when executing PRIMARY REGISTER
INFO: connecting to primary database...
DEBUG: connecting to: "user=repmgr dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
NOTICE: attempting to install extension "repmgr"
DEBUG: established superuser connection as "postgres"
NOTICE: "repmgr" extension successfully installed
NOTICE: primary node record (ID: 1) registered

可以看到,扩展插件repmgr启用了,具体路径为repmgr数据库下的名为repmgr的schema下

 命令行也可以佐证确实是安装了插件repmgr:

repmgr=# set search_path to repmgr ;
SET
repmgr=# \dx
                    List of installed extensions
  Name   | Version |   Schema   |            Description             
---------+---------+------------+------------------------------------
 plpgsql | 1.0     | pg_catalog | PL/pgSQL procedural language
 repmgr  | 5.4     | repmgr     | Replication manager for PostgreSQL
(2 rows)

如果觉得注册的不对,可以重新注册,重新注册命令如下:

-bash-4.2$ repmgr primary unregister -S postgres --force
WARNING: following problems with command line parameters detected:
  --superuser ignored when executing PRIMARY UNREGISTER
DEBUG: connecting to: "user=repmgr dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
INFO: node "pg1" (ID: 1) was successfully unregistered

看到 successfully后,再次执行注册就可以,查看repmgr集群的状态:

-bash-4.2$ repmgr cluster show
DEBUG: connecting to: "user=repmgr dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                       
----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------
 1  | pg1  | primary | * running |          | pg1      | 100      | 1        | host=192.168.123.17 port=15433 user=repmgr dbname=repmgr

6、

从节点20的repmgr的配置文件

node_id=2
node_name='pg2'
conninfo='host=192.168.123.20 port=15433 user=repmgr dbname=repmgr password=123456'
data_directory='/data/pgsql/data'
replication_user='repmgr'
replication_type='physical'
#location='pg2'
use_replication_slots=true
witness_sync_interval=15
log_level='DEBUG'
log_facility='STDERR'
log_file='/var/log/repmgr/pg2.log'
pg_bindir='/usr/pgsql-12/bin/'
ssh_options='-q -o ConnectTimeout=10'
promote_check_timeout=60
promote_check_interval=1
node_rejoin_timeout=60
failover='automatic'
#priority=120
connection_check_type=ping
promote_command='/usr/pgsql-12/bin/repmgr standby promote -f /etc/repmgr/12/repmgr.conf --log-to-file'
follow_command='/usr/pgsql-12/bin/repmgr standby follow -f /etc/repmgr/12/repmgr.conf --log-to-file --upstream-node-id=%n'
monitoring_history=yes
service_start_command = '/usr/bin/systemctl start pg12'
service_stop_command = '/usr/bin/systemctl stop pg12'
service_restart_command = '/usr/bin/systemctl restart pg12'
service_reload_command = '/usr/bin/systemctl reload pg12'
service_promote_command = '/usr/pgsql-12/bin/pg_ctl promote -D /data/pgsql/data'
repmgrd_service_start_command = '/usr/pgsql-12/bin/repmgrd -f /etc/repmgr/12/repmgr.conf -p /run/repmgr/repmgrd-12.pid -d --verbose'
repmgrd_service_stop_command = '/usr/bin/kill `cat /run/repmgr/repmgrd-12.pid`'

7、

从节点利用流复制搭建主从集群,通过repmgr命令来执行(postgres这个普通用户执行)

repmgr standby clone -h 192.168.123.17 -p15433 -d repmgr --force  -U repmgr -c

HINT: this may take some time; consider using the -c/--fast-checkpoint option 

日志如下:

NOTICE: destination directory "/data/pgsql/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.123.17 port=15433 user=repmgr dbname=repmgr
DETAIL: current installation size is 32 MB
DEBUG: 1 node records returned by source node
DEBUG: connecting to: "user=repmgr dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: upstream_node_id determined as 1
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
WARNING: directory "/data/pgsql/data" exists but is not empty
NOTICE: -F/--force provided - deleting existing data directory "/data/pgsql/data"
DEBUG: replication slot "repmgr_slot_2" exists but is inactive; reusing
NOTICE: starting backup (using pg_basebackup)...
INFO: executing:
  /usr/pgsql-12/bin/pg_basebackup -l "repmgr base backup"  -D /data/pgsql/data -h 192.168.123.17 -p 15433 -U repmgr -c fast -X stream -S repmgr_slot_2 
shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
could not identify current directory: No such file or directory
DEBUG: connecting to: "user=repmgr dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: /usr/bin/systemctl start pg12
HINT: after starting the server, you need to register this standby with "repmgr standby register"

命令执行完成后,就可以启动从节点了:

systemctl start pg12

查看进程可以看到从节点正常复制了:

[root@centos10 ~]# ps aux |grep postgre
root      1637  0.0  0.0 191784  2336 pts/0    S    Mar30   0:00 su - postgres
postgres  1638  0.0  0.0 115444  2064 pts/0    S+   Mar30   0:00 -bash
postgres  1744  0.3  1.1 450504 46412 ?        Ss   00:05   0:00 /usr/pgsql-12/bin/postmaster -D /data/pgsql/data/
postgres  1745  0.0  0.0 260664  2084 ?        Ss   00:05   0:00 postgres: logger   
postgres  1746  0.0  0.0 450708  3100 ?        Ss   00:05   0:00 postgres: startup   recovering 000000010000000000000013
postgres  1747  0.0  0.0 450504  2112 ?        Ss   00:05   0:00 postgres: checkpointer   
postgres  1748  0.0  0.0 450504  2120 ?        Ss   00:05   0:00 postgres: background writer   
postgres  1749  0.0  0.0 262784  2008 ?        Ss   00:05   0:00 postgres: stats collector   
postgres  1750  0.4  0.1 457520  4700 ?        Ss   00:05   0:00 postgres: walreceiver   streaming 0/130007C8
root      1752  0.0  0.0 112712   964 pts/1    S+   00:05   0:00 grep --color=auto postgre

主节点正常发送wal日志了:

[root@centos7 ~]# ps aux |grep postgres
postgres  2205  0.0  1.1 450504 46464 ?        Ss   00:01   0:00 /usr/pgsql-12/bin/postmaster -D /data/pgsql/data/
postgres  2207  0.0  0.0 260664  2084 ?        Ss   00:01   0:00 postgres: logger   
postgres  2209  0.0  0.1 450908  4180 ?        Ss   00:01   0:00 postgres: checkpointer   
postgres  2210  0.0  0.0 450764  3916 ?        Ss   00:01   0:00 postgres: background writer   
postgres  2211  0.0  0.1 450504  6296 ?        Ss   00:01   0:00 postgres: walwriter   
postgres  2212  0.0  0.0 451784  3352 ?        Ss   00:01   0:00 postgres: autovacuum launcher   
postgres  2213  0.0  0.0 262784  2196 ?        Ss   00:01   0:00 postgres: archiver   last was 000000010000000000000012.00000028.backup
postgres  2214  0.0  0.0 262940  2352 ?        Ss   00:01   0:00 postgres: stats collector   
postgres  2215  0.0  0.0 451624  2856 ?        Ss   00:01   0:00 postgres: logical replication launcher   
root      2216  0.0  0.0 191784  2336 pts/0    S    00:01   0:00 su - postgres
postgres  2217  0.0  0.0 115444  2004 pts/0    S+   00:01   0:00 -bash
postgres  2260  0.0  0.0 451812  3848 ?        Ss   00:05   0:00 postgres: walsender repmgr 192.168.123.20(38408) streaming 0/130008B0
postgres  2265  0.0  0.1 452464  7044 ?        Ss   00:07   0:00 postgres: postgres postgres 192.168.123.1(56048) idle
root      2294  0.0  0.0 112712   960 pts/1    S+   00:10   0:00 grep --color=auto postgres

8、

注册从节点

-bash-4.2$ repmgr standby register
INFO: connecting to local node "pg2" (ID: 2)
DEBUG: connecting to: "user=repmgr dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
INFO: connecting to primary database
DEBUG: connecting to: "user=repmgr dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)
INFO: standby registration complete
NOTICE: standby node "pg2" (ID: 2) successfully registered

9、

查看集群状态

-bash-4.2$ repmgr cluster show
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                           
----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------------
 1  | pg1  | primary | * running |          | default  | 100      | 1        | host=192.168.123.17 port=15433 user=repmgr dbname=repmgr password=123456
 2  | pg2  | standby |   running | pg1      | default  | 100      | 1        | host=192.168.123.20 port=15433 user=repmgr dbname=repmgr password=123456



三、

自动故障转移和测试

如果需要启用自动故障转移,那么,我们需要开启repmgrd服务,每个节点都开启

开启比较简单,一条命令就可以了

systemctl start repmgr-12

服务启动完毕后,可以查看进程是否正确:

[root@centos7 ~]# ps aux |grep repm
postgres  5915  0.2  0.2 453488  9716 ?        Ss   09:28   0:00 postgres: repmgr repmgr 192.168.123.17(35544) idle
postgres  5917  0.0  0.0  84276  2160 ?        S    09:28   0:00 /usr/pgsql-12/bin/repmgrd -f /etc/repmgr/12/repmgr.conf -p /run/repmgr/repmgrd-12.pid -d --verbose
root      5925  0.0  0.0 112712   960 pts/3    S+   09:29   0:00 grep --color=auto repm

注意,这里问题来了,该服务是依赖于postgresql数据库的进程的,数据库必须是在线的才可以启停repmgrd服务

那么,很多人会疑惑,这个服务有什么用?其实很简单,就是故障转移用的,这些是关键配置,在发生宕机的情况下,这些命令会自动的启停数据库,例如,主节点宕机,那么,从节点的该服务将会在规定时间内检查主节点,如果一直找不到主,将会把自己提升为主,先执行这个命令:

promote_command='/usr/pgsql-12/bin/repmgr standby promote -f /etc/repmgr/12/repmgr.conf --log-to-file'

然后执行这个命令:

service_promote_command = '/usr/pgsql-12/bin/pg_ctl promote -D /data/pgsql/data'

特别注意:这些命令全部需要绝对路径,否则会报找不到命令,配置文件修改后,重启repmgrd服务就可以生效

promote_command='/usr/pgsql-12/bin/repmgr standby promote -f /etc/repmgr/12/repmgr.conf --log-to-file'
follow_command='/usr/pgsql-12/bin/repmgr standby follow -f /etc/repmgr/12/repmgr.conf --log-to-file --upstream-node-id=%n'
monitoring_history=yes
service_start_command = '/usr/bin/systemctl start pg12'
service_stop_command = '/usr/bin/systemctl stop pg12'
service_restart_command = '/usr/bin/systemctl restart pg12'
service_reload_command = '/usr/bin/systemctl reload pg12'
service_promote_command = '/usr/pgsql-12/bin/pg_ctl promote -D /data/pgsql/data'
repmgrd_service_start_command = '/usr/pgsql-12/bin/repmgrd -f /etc/repmgr/12/repmgr.conf -p /run/repmgr/repmgrd-12.pid -d --verbose'
repmgrd_service_stop_command = '/usr/bin/kill `cat /run/repmgr/repmgrd-12.pid`'

下面是将主节点数据库服务关闭后,从节点的repmgrd服务打印的日志/var/log/repmgr/pg2.log

[2024-03-31 09:13:29] [INFO] 2 total nodes registered
[2024-03-31 09:13:29] [INFO] primary node  "pg1" (ID: 1) and this node have the same location ("default")
[2024-03-31 09:13:29] [INFO] no other sibling nodes - we win by default
[2024-03-31 09:13:29] [DEBUG] election result: WON
[2024-03-31 09:13:29] [NOTICE] this node is the only available candidate and will now promote itself
[2024-03-31 09:13:29] [DEBUG] get_node_record():
  SELECT n.node_id, n.type, n.upstream_node_id, n.node_name,  n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name, NULL AS attached   FROM repmgr.nodes n  WHERE n.node_id = 1
[2024-03-31 09:13:29] [INFO] promote_command is:
  "/usr/pgsql-12/bin/repmgr standby promote -f /etc/repmgr/12/repmgr.conf --log-to-file"
[2024-03-31 09:13:29] [NOTICE] redirecting logging output to "/var/log/repmgr/pg2.log"

[2024-03-31 09:13:29] [DEBUG] connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
[2024-03-31 09:13:29] [DEBUG] connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
[2024-03-31 09:13:29] [DEBUG] connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
[2024-03-31 09:13:29] [NOTICE] promoting standby to primary
[2024-03-31 09:13:29] [DETAIL] promoting server "pg2" (ID: 2) using "/usr/pgsql-12/bin/pg_ctl promote -D /data/pgsql/data"
[2024-03-31 09:13:29] [NOTICE] waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
[2024-03-31 09:13:29] [DEBUG] setting node 2 as primary and marking existing primary as failed
[2024-03-31 09:13:29] [NOTICE] STANDBY PROMOTE successful
[2024-03-31 09:13:29] [DETAIL] server "pg2" (ID: 2) was successfully promoted to primary
[2024-03-31 09:13:29] [DEBUG] result of promote_command: 0
[2024-03-31 09:13:29] [INFO] checking state of node 2, 1 of 6 attempts

再次启动原主节点的postgresql数据库,在主节点查看集群状态,可以发现脑裂了:

-bash-4.2$ repmgr cluster show
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
 ID | Name | Role    | Status               | Upstream | Location | Priority | Timeline | Connection string                                                           
----+------+---------+----------------------+----------+----------+----------+----------+------------------------------------------------------------------------------
 1  | pg1  | primary | * running            |          | default  | 100      | 1        | host=192.168.123.17 port=15433 user=repmgr dbname=repmgr password=123456
 2  | pg2  | standby | ! running as primary |          | default  | 100      | 2        | host=192.168.123.20 port=15433 user=repmgr dbname=repmgr password=123456

WARNING: following issues were detected
  - node "pg2" (ID: 2) is registered as standby but running as primary

在从节点,可以看到显示的不一样,不过仍然是脑裂:

-bash-4.2$ repmgr cluster show
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                           
----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------------
 1  | pg1  | primary | - failed  | ?        | default  | 100      |          | host=192.168.123.17 port=15433 user=repmgr dbname=repmgr password=123456
 2  | pg2  | primary | * running |          | default  | 100      | 2        | host=192.168.123.20 port=15433 user=repmgr dbname=repmgr password=123456

WARNING: following issues were detected
  - unable to connect to node "pg1" (ID: 1)

解决方案:

首先,在从节点强制踢掉自己,数据库服务不能关闭,命令如下:

repmgr primary unregister --force

然后关闭从节点的数据库服务,重新从现主17服务器拉取:

repmgr standby clone -h 192.168.123.17 -p15433 -d repmgr --force  -U repmgr -c

同步拉取完毕后,在重新强制注册为standby节点,就恢复正常了:

-bash-4.2$ repmgr standby register --force
INFO: connecting to local node "pg2" (ID: 2)
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
INFO: connecting to primary database
DEBUG: connecting to: "user=repmgr password=1234562 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
INFO: standby registration complete
NOTICE: standby node "pg2" (ID: 2) successfully registered

再次查看集群状态,发现恢复正常了:

-bash-4.2$ repmgr cluster show
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                           
----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------------
 1  | pg1  | primary | * running |          | default  | 100      | 1        | host=192.168.123.17 port=15433 user=repmgr dbname=repmgr password=123456
 2  | pg2  | standby |   running | pg1      | default  | 100      | 1        | host=192.168.123.20 port=15433 user=repmgr dbname=repmgr password=123456

总结:

脑裂的处理步骤是,首先,将异常的节点也就是 running as primary的节点踢掉,这个节点在哪,就在哪个节点踢掉,踢节点也就是unregister,踢完后standby clone,重新克隆的时候需要停止异常节点数据库服务,也就是原从节点的数据库服务,完事后在重新注册并再次启动原从节点的数据库就可以了;而脑裂产生的原因是原主节点数据库挂掉后,又重新启动数据库服务了,重新启动数据库后,原主认为自己是主节点,而原从节点已经提升为主节点,它也认为自己是主节点,造成集群内有两个主节点的信息了,那么,此时集群的主从复制就是混乱状态

如果脑裂很快发现按上述步骤没什么问题,但如果时间很长才发现,那么,需要根据pg_controlldata命令确定哪个是主,哪个是从,Latest checkpoint's oldestXID高的为主,反之为从,在从节点按上述操作即可

-bash-4.2$ pg_controldata 
pg_control version number:            1201
Catalog version number:               201909212
Database system identifier:           7351900556498275811
Database cluster state:               in production
pg_control last modified:             Sun 31 Mar 2024 11:41:05 AM CST
Latest checkpoint location:           0/2F001528
Latest checkpoint's REDO location:    0/2F0014F0
Latest checkpoint's REDO WAL file:    00000001000000000000002F
Latest checkpoint's TimeLineID:       1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          0:3382
Latest checkpoint's NextOID:          16454
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        479
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  3382
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint:            Sun 31 Mar 2024 11:41:05 AM CST
Fake LSN counter for unlogged rels:   0/3E8
Minimum recovery ending location:     0/0
Min recovery ending loc's timeline:   0
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
wal_level setting:                    logical
wal_log_hints setting:                on
max_connections setting:              1000
max_worker_processes setting:         8
max_wal_senders setting:              10
max_prepared_xacts setting:           0
max_locks_per_xact setting:           64
track_commit_timestamp setting:       off
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Size of a large-object chunk:         2048
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value
Data page checksum version:           0
Mock authentication nonce:            e76af00554364729517eece5fa0fcf6f7f0662da3a154cabb730c2ab24edd267

重新加入节点:

repmgr node rejoin -h 192.168.123.20 -p 15433 -Upostgres -d repmgr --force-rewind

四,

主备手动切换

repmgr standby switchover --force -S postgres

这个命令比较方便,在从节点执行,不过需要配置免密,并且需要standby从库有上级,比如下面的,Upstream这一栏里,表明pg1的上级是pg2,pg2是主节点,pg1是从节点,

-bash-4.2$ repmgr cluster show
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                           
----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------------
 1  | pg1  | standby |   running | pg2      | default  | 100      | 2        | host=192.168.123.17 port=15433 user=repmgr dbname=repmgr password=123456
 2  | pg2  | primary | * running |          | default  | 100      | 3        | host=192.168.123.20 port=15433 user=repmgr dbname=repmgr password=123456

那么,想快速的把这两个节点调换一个位置就是上面的switchover命令了,非常的方便,比如,现在我又想把pg2切换到从,pg1切换到主,那么,应该是在pg1上执行switchover命令

在切换期间,pg2的数据库需要重启一次,重启完毕后,看pg1的日志如下:

EBUG: minimum of 1 free slots (0 for siblings) required; 10 available
NOTICE: attempting to pause repmgrd on 2 nodes
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: pausing repmgrd on node "pg1" (ID: 1)
DEBUG: pausing repmgrd on node "pg2" (ID: 2)
NOTICE: local node "pg1" (ID: 1) will be promoted to primary; current primary "pg2" (ID: 2) will be demoted to standby
NOTICE: stopping current primary node "pg2" (ID: 2)
DEBUG: remote_command():
  ssh -o Batchmode=yes -q -o ConnectTimeout=10 192.168.123.20 /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf node service --action=stop --checkpoint --superuser=postgres
NOTICE: issuing CHECKPOINT on node "pg2" (ID: 2) 
DETAIL: executing server command "/usr/bin/systemctl stop pg12"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 3 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 4 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 5 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 6 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 7 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 8 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 9 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 10 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 11 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 12 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 13 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 14 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 15 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 16 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 17 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_NO_RESPONSE
DEBUG: remote_command():
  ssh -o Batchmode=yes -q -o ConnectTimeout=10 192.168.123.20 /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf node status --is-shutdown-cleanly
NOTICE: current primary has been cleanly shut down at location 0/3C000028
DEBUG: local node last receive LSN is 0/3C0000A0, primary shutdown checkpoint LSN is 0/3C000028
NOTICE: promoting standby to primary
DETAIL: promoting server "pg1" (ID: 1) using "/usr/pgsql-12/bin/pg_ctl promote -D /data/pgsql/data"
waiting for server to promote.... done
server promoted
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
DEBUG: setting node 1 as primary and marking existing primary as failed
NOTICE: STANDBY PROMOTE successful
DETAIL: server "pg1" (ID: 1) was successfully promoted to primary
DEBUG: executing:
  /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf  --no-wait -d \'user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433\' node rejoin > /tmp/node-rejoin.1711860263.log 2>&1 && echo "1" || echo "0"
DEBUG: remote_command():
  ssh -o Batchmode=yes -q -o ConnectTimeout=10 192.168.123.20 /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf  --no-wait -d \'user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433\' node rejoin > /tmp/node-rejoin.1711860263.log 2>&1 && echo "1" || echo "0"
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
ERROR: connection to database failed
DETAIL: 
could not connect to server: Connection refused
	Is the server running on host "192.168.123.20" and accepting
	TCP/IP connections on port 15433?

DETAIL: attempted to connect using:
  user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path=
ERROR: execution of "repmgr node rejoin" on demotion candidate "pg2" (ID: 2) failed
DETAIL: check log file "/tmp/node-rejoin.1711860263.log" on "pg2" for details
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
ERROR: connection to database failed
DETAIL: 
could not connect to server: Connection refused
	Is the server running on host "192.168.123.20" and accepting
	TCP/IP connections on port 15433?

DETAIL: attempted to connect using:
  user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path=
INFO: sleeping 1 second; 1 of 60 attempts ("standby_reconnect_timeout") to reconnect to demoted primary
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
NOTICE: replication slot "repmgr_slot_1" deleted on node 2
NOTICE: switchover was successful
DETAIL: node "pg1" is now primary and node "pg2" is attached as standby
DEBUG: unpausing repmgrd on node "pg1" (ID: 1)
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: unpausing repmgrd on node "pg2" (ID: 2)
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
NOTICE: STANDBY SWITCHOVER has completed successfully

在强调一次,需要配置免密才可以使用此功能,因为从切换到主的时候,需要ssh到主做一点事情,切换完毕后,原主也就是新从需要在注册一下,到pg2执行:

-bash-4.2$ repmgr standby register -F
INFO: connecting to local node "pg2" (ID: 2)
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
INFO: connecting to primary database
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
INFO: standby registration complete
NOTICE: standby node "pg2" (ID: 2) successfully registered

再看集群状态就十分正常了:

-bash-4.2$ repmgr service status
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
 ID | Name | Role    | Status    | Upstream | repmgrd | PID  | Paused? | Upstream last seen
----+------+---------+-----------+----------+---------+------+---------+--------------------
 1  | pg1  | primary | * running |          | running | 7631 | no      | n/a                
 2  | pg2  | standby |   running | pg1      | running | 5511 | no      | 0 second(s) ago    

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1558486.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

TonyBai go语言第一课 学习笔记

文章目录 大纲前置篇显式 大纲 前置篇 显式 package main import "fmt" func main() { var a int16 5 var b int 8 var c int64 c a b fmt.Printf("%d\n", c) }如果我们编译这段程序,将得到类似这样的编译器错误:“invalid op…

jmeter性能压测的标准和实战中会遇到的问题

1.性能标准建议 CPU 使用率:不超过 70% 内存使用率:不超过 70% 磁盘:%util到达80%严重繁忙 (os.disIO.filesystem.writeKbPS 每秒写入的千字节) 响应时间:95%的响应时间不超过8000ms 事务成功率&#xff1a…

计算线上点坐标

综上代码 void point_on_line(float* res, float* p, float* q, float lambda) {float tmp 1 - lambda;res[0] tmp* p[0] lambda * q[0];res[1] tmp* p[1] lambda * q[1];res[2] tmp* p[2] lambda * q[2]; }

基于SpringBoot华强北二手手机商城系统

点赞收藏关注 → 私信领取本源代码、数据库 一、项目概述 项目名称:基于SpringBoot华强北二手手机商城 项目架构:B/S架构 开发语言:Java语言 主要技术:SpringBootMybatisMySQL 运行环境:Windows7以上、JDK1.8、M…

【微服务】Nacos(配置中心)

文章目录 1.AP和CP1.基本介绍2.说明 2.Nacos配置中心实例1.架构图2.在Nacos Server加入配置1.配置列表,加号2.加入配置3.点击发布,然后返回4.还可以编辑 3. 创建 Nacos 配置客户端模块获取配置中心信息1.创建子模块 e-commerce-nacos-config-client50002…

【研发日记】Matlab/Simulink开箱报告(十一)——Requirements Toolbox

目录 前言 Requirements Toolbox 编写需求 需求联接设计 需求跟踪开发进度 追溯性矩阵 分析和应用 总结 前言 见《开箱报告,Simulink Toolbox库模块使用指南(六)——S-Fuction模块(TLC)》 见《开箱报告&#x…

弧形导轨在自动化设备中的传动原理

在自动化机械系统中,弧形导轨是一种常见的轨道结构,用于支撑和引导物体沿着指定的弧线运动。其工作原理基于几何学和物理学的原理。 弧形导轨通常由一个弧形的轨道和一个移动部件组成。轨道一般呈弧形,其几何形状可以是圆弧、椭圆弧等&#x…

JavaScript基础语法–变量

文章目录 认识JavaScript变量程序中变量的数据(记录)–变量变量的命名格式在Java script中变量定义包含两部分1. 变量声明(高级JS引擎接下来定义一个变量)2. 其他的写法 变量命名的规范(遵守)变量的练习a. …

使用 eNSP 模拟交换机防御 ARP 泛洪攻击

一、介绍 ARP(地址解析协议)泛洪攻击是一种网络攻击手法,利用 ARP 协议的工作方式来实施。ARP 协议用于将 IP 地址映射到 MAC 地址,以便在局域网中发送数据包。ARP 泛洪攻击的基本原理是向网络中广播大量伪造的 ARP 请求&#xf…

Doris实践——票务平台的实时数仓建设

目录 前言 一、引入 Doris原因 二、基于Doris搭建数据平台 2.1 构建实时数仓 2.2 Flink CDC全库同步 三、基于Doris进行OLAP报表开发 四、未来规划 原文大佬介绍的这篇票务平台的实时数仓建设有借鉴意义,现摘抄下来用作沉淀学习。如有侵权,请告知…

高效解决Visual Studio无法识别到自定义头文件

文章目录 问题解决方案 问题 说明你没有好好配置项目属性 解决方案 把头文件都集中存放到一个文件夹里 之后我会持续更新,如果喜欢我的文章,请记得一键三连哦,点赞关注收藏,你的每一个赞每一份关注每一次收藏都将是我前进路…

【c++】类和对象(六)深入了解隐式类型转换

🔥个人主页:Quitecoder 🔥专栏:c笔记仓 朋友们大家好,本篇文章我们来到初始化列表,隐式类型转换以及explicit的内容 目录 1.初始化列表1.1构造函数体赋值1.2初始化列表1.2.1隐式类型转换与复制初始化 1.3e…

Learning from Multiple Annotator Noisy Labels via Sample-wise Label Fusion

confusion matrix P n ( r ) _n^{(r)} n(r)​ pillow8.3.1和python3.7.11的环境不好满足,不建议复现

笔记: JavaSE day16笔记 - string字符串

第十六天课堂笔记 学习任务 Comparable接口★★★★ 接口 : 功能的封装 > 一组操作规范 一个抽象方法 -> 某一个功能的封装多个抽象方法 -> 一组操作规范 接口与抽象类的区别 1本质不同 接口是功能的封装 , 具有什么功能 > 对象能干什么抽象类是事物本质的抽象 &…

RocketMQ(版本4.9.4)+RocketMQ_Dashbord环境搭建(生产者、消费者的前置环境搭建)

一、官方网站下载 RocketMQ源码包 https://rocketmq.apache.org/zh/docs/4.x/introduction/02quickstart 二、把rocketMQ上传到Linux环境下解压,编译,执行以下命令(需要提前装jdk和maven并配置好环境变量) unzip rocketmq-all-4…

短视频账号矩阵系统/开发 -- -- -- 蒙太奇算法上线

短视频账号矩阵系统,短视频矩阵系统开发3年技术之路,目前已经在技术竞品出沉淀出来,近期技术迭代的新的功能同步喽: php7.4版本,自研框架,有开发文档,类laravel框架 近期剪辑迭代的技术算法&am…

Unix信号处理

信号的基本概念我已经在上一节中简单介绍了&#xff0c;大家可以去看我的上一篇博客&#xff1a; Unix中的进程和线程-2-CSDN博客 1.信号的产生 kill函数&#xff1a; #include <signal.h> #include <fcntl.h> #include<t_stdio.h> //自定义信号处理函数,n为…

深度学习算法概念介绍

前言 深度学习算法是一类基于人工神经网络的机器学习方法&#xff0c;其核心思想是通过多层次的非线性变换&#xff0c;从数据中学习表示层次特征&#xff0c;从而实现对复杂模式的建模和学习。深度学习算法在图像识别、语音识别、自然语言处理等领域取得了巨大的成功&#xf…

深入理解数据结构(3):栈和队列详解

文章主题&#xff1a;顺序表和链表详解&#x1f331;所属专栏&#xff1a;深入理解数据结构&#x1f4d8;作者简介&#xff1a;更新有关深入理解数据结构知识的博主一枚&#xff0c;记录分享自己对数据结构的深入解读。&#x1f604;个人主页&#xff1a;[₽]的个人主页&#x…

吴恩达机器学习笔记 三十一 K-means算法及优化目标 成本函数

随机初始化 K 个集群质心&#xff0c;这里设质心个数为2 第一步&#xff1a;分配点给集群质心 对 m 个 点&#xff0c;每个点对应的质心为 c(i) ,若离红色的质心近&#xff0c;则 c(i) 的值为1&#xff0c;否则为2。计算距离用的是L2范式。 第二步&#xff1a;移动集群质心 …