【openGauss5.0企业版一主一备集群】—运维
- 🔻 一、openGauss5.0主从集群的维护
- 🔰 1.1 查看单个节点状态
- 🔰 1.2 查看集群所有节点状态
- 🔰 1.3 集群启停
- 🔰 1.4 switchover 主备切换
- 🔰 1.5 模拟主库宕机,备库处于 Standby Need repair(Disconnected)
- 🔰 1.6 模拟双主
- 🔻 总结—温故知新
👈【上一篇】 |
💖The Begin💖 点点关注,收藏不迷路💖
| 【下一篇】👉 |
🔻 一、openGauss5.0主从集群的维护
🔰 1.1 查看单个节点状态
###🟢 切换omm用户
[root@pg-node01 ~]# su - omm
###🟢 查看pg-node01节点状态,其中,pg-node01为待查询主机的名称。
[omm@pg-node01 ~]$ gs_om -t status -h pg-node01
-----------------------------------------------------------------------
cluster_state : Normal
redistributing : No
-----------------------------------------------------------------------
node : 1
node_name : pg-node01
instance_id : 6001
node_ip : 192.168.181.11
data_path : /opt/software/openGauss5.0/install/data/dn
instance_port : 15600
type : Datanode
instance_state : Normal
az_name : AZ1
static_connections : 1
HA_state : Normal
instance_role : Primary
-----------------------------------------------------------------------
[omm@pg-node01 ~]$
- ♻️ 节点角色参数说明
🔰 1.2 查看集群所有节点状态
###🟢 切换omm用户
[root@pg-node01 ~]# su - omm
###🟢 查看集群所有节点状态
[omm@pg-node01 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Normal
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip port instance state
---------------------------------------------------------------------------------------------------------------
1 pg-node01 192.168.181.11 15600 6001 /opt/software/openGauss5.0/install/data/dn P Primary Normal
2 pg-node02 192.168.181.12 15600 6002 /opt/software/openGauss5.0/install/data/dn S Standby Normal
[omm@pg-node01 ~]$
- ♻️ 节点状态参数说明
🔰 1.3 集群启停
可以通过omm用户在集群的任一主节点上进行操作。
###🟢 停止openGauss数据库集群服务
[omm@pg-node01 ~]$ gs_om -t stop
Stopping cluster.
=========================================
Successfully stopped cluster.
=========================================
End stop cluster.
[omm@pg-node01 ~]$
###🟢 启动openGauss数据库集群服务
[omm@pg-node01 ~]$ gs_om -t start
Starting cluster.
=========================================
[SUCCESS] pg-node01
2023-07-06 15:12:27.240 64a7121b.1 [unknown] 139702749823232 [unknown] 0 dn_6001_6002 01000 0 [BACKEND] WARNING: could not create any HA TCP/IP sockets
2023-07-06 15:12:27.242 64a7121b.1 [unknown] 139702749823232 [unknown] 0 dn_6001_6002 01000 0 [BACKEND] WARNING: Failed to initialize the memory protect for g_instance.attr.attr_storage.cstore_buffers (1024 Mbytes) or shared memory (4470 Mbytes) is larger.
[SUCCESS] pg-node02
2023-07-06 15:12:30.915 64a7121e.1 [unknown] 140277375107328 [unknown] 0 dn_6001_6002 01000 0 [BACKEND] WARNING: could not create any HA TCP/IP sockets
2023-07-06 15:12:30.918 64a7121e.1 [unknown] 140277375107328 [unknown] 0 dn_6001_6002 01000 0 [BACKEND] WARNING: Failed to initialize the memory protect for g_instance.attr.attr_storage.cstore_buffers (1024 Mbytes) or shared memory (4470 Mbytes) is larger.
=========================================
Successfully started.
[omm@pg-node01 ~]$
###🟢 openGauss数据库集群状态检查
[omm@pg-node01 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Normal
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip port instance state
---------------------------------------------------------------------------------------------------------------
1 pg-node01 192.168.181.11 15600 6001 /opt/software/openGauss5.0/install/data/dn P Primary Normal
2 pg-node02 192.168.181.12 15600 6002 /opt/software/openGauss5.0/install/data/dn S Standby Normal
[omm@pg-node01 ~]$
🔰 1.4 switchover 主备切换
数据库节点 | 说明 |
---|---|
192.168.181.11 | 主节点 |
192.168.181.12 | 从节点 |
这里主库是192.168.181.11,我们将192.168.181.12激活成主库,需要在192.168.181.12上用omm 执行:gs_ctl switchover -D /op/openGauss/install/data/dn
###🟢 步骤1、切换omm用户 ---192.168.181.12
[root@pg-node01 ~]# su - omm
###🟢 步骤2、切换192.168.181.12 为主节点
[omm@pg-node02 ~]$ gs_ctl switchover -D /opt/software/openGauss5.0/install/data/dn
[2023-07-06 15:19:26.007][36453][][gs_ctl]: gs_ctl switchover ,datadir is /opt/software/openGauss5.0/install/data/dn
[2023-07-06 15:19:26.007][36453][][gs_ctl]: switchover term (1)
[2023-07-06 15:19:26.015][36453][][gs_ctl]: waiting for server to switchover........
[2023-07-06 15:19:31.119][36453][][gs_ctl]: done
[2023-07-06 15:19:31.119][36453][][gs_ctl]: switchover completed (/opt/software/openGauss5.0/install/data/dn)
[omm@pg-node02 ~]$
- ❗ 注意
🟢1、对于同一数据库,上一次主备切换未完成,不能执行下一次切换。
🟢2、当业务正在操作时,发起switchover,可能主机的线程无法停止导致switchover显示超时,实际后台仍然在运行,等主机线程停止后,switchover即可完成。比如在主机删除一个大的分区表时,可能无法响应switchover发起的信号。
✅ switchover或failover成功后,执行如下命令记录当前主备机器信息:
###🟢 步骤3、记录当前主备机器信息
[omm@pg-node02 ~]$ gs_om -t refreshconf
Generating dynamic configuration file for all nodes.
Successfully generated dynamic configuration file.
[omm@pg-node02 ~]$
###🟢 步骤4、切换检查(可以看到 192.168.181.12,从 Standby Normal变为了 Primary Normal)
[omm@pg-node02 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Normal
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip port instance state
---------------------------------------------------------------------------------------------------------------
1 pg-node01 192.168.181.11 15600 6001 /opt/software/openGauss5.0/install/data/dn P Standby Normal
2 pg-node02 192.168.181.12 15600 6002 /opt/software/openGauss5.0/install/data/dn S Primary Normal
[omm@pg-node02 ~]$
🔰 1.5 模拟主库宕机,备库处于 Standby Need repair(Disconnected)
[omm@pg-node02 ~]$ gs_ctl stop -D /opt/software/openGauss5.0/install/data/dn
[2023-07-06 17:03:15.321][17481][][gs_ctl]: gs_ctl stopped ,datadir is /opt/software/openGauss5.0/install/data/dn
waiting for server to shut down.... done
server stopped
[omm@pg-node02 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Unavailable
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip port instance state
---------------------------------------------------------------------------------------------------------------
1 pg-node01 192.168.181.11 15600 6001 /opt/software/openGauss5.0/install/data/dn P Standby Need repair(Disconnected)
2 pg-node02 192.168.181.12 15600 6002 /opt/software/openGauss5.0/install/data/dn S Down Manually stopped
[omm@pg-node02 ~]$
###🟢 1、主备之间可以通过switchover进行角色切换,主机pg-node02故障后可以通过failover对备机pg-node01进行升主。
[omm@pg-node01 ~]$ gs_ctl failover -D /opt/software/openGauss5.0/install/data/dn
[2023-07-06 17:25:16.619][42471][][gs_ctl]: gs_ctl failover ,datadir is /opt/software/openGauss5.0/install/data/dn
[2023-07-06 17:25:16.619][42471][][gs_ctl]: failover term (1)
[2023-07-06 17:25:16.631][42471][][gs_ctl]: waiting for server to failover...
.[2023-07-06 17:25:17.692][42471][][gs_ctl]: done
[2023-07-06 17:25:17.692][42471][][gs_ctl]: failover completed (/opt/software/openGauss5.0/install/data/dn)
[omm@pg-node01 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Degraded
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip port instance state
---------------------------------------------------------------------------------------------------------------
1 pg-node01 192.168.181.11 15600 6001 /opt/software/openGauss5.0/install/data/dn P Primary Normal
2 pg-node02 192.168.181.12 15600 6002 /opt/software/openGauss5.0/install/data/dn S Down Manually stopped
[omm@pg-node01 ~]$
###🟢 2、执行以下命令,以standby模式启动备节点pg-node02。
[omm@pg-node02 ~]$ gs_ctl start -D /opt/software/openGauss5.0/install/data/dn -M standby
[2023-07-06 17:47:13.138][20533][][gs_ctl]: gs_ctl started,datadir is /opt/software/openGauss5.0/install/data/dn
[2023-07-06 17:47:13.202][20533][][gs_ctl]: waiting for server to start...
.0 LOG: [Alarm Module]can not read GAUSS_WARNING_TYPE env.
0 LOG: [Alarm Module]Host Name: pg-node02
0 LOG: [Alarm Module]Host IP: pg-node02. Copy hostname directly in case of taking 10s to use 'gethostbyname' when /etc/hosts does not contain <HOST IP>
0 LOG: [Alarm Module]Cluster Name: dbCluster
0 LOG: [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 58
0 WARNING: failed to open feature control file, please check whether it exists: FileName=gaussdb.version, Errno=2, Errmessage=No such file or directory.
0 WARNING: failed to parse feature control file: gaussdb.version.
0 WARNING: Failed to load the product control file, so gaussdb cannot distinguish product version.
2023-07-06 17:47:13.367 64a73661.1 [unknown] 139657269428480 [unknown] 0 dn_6001_6002 DB010 0 [REDO] LOG: Recovery parallelism, cpu count = 4, max = 4, actual = 4
2023-07-06 17:47:13.367 64a73661.1 [unknown] 139657269428480 [unknown] 0 dn_6001_6002 DB010 0 [REDO] LOG: ConfigRecoveryParallelism, true_max_recovery_parallelism:4, max_recovery_parallelism:4
2023-07-06 17:47:13.382 64a73661.1 [unknown] 139657269428480 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: [Alarm Module]can not read GAUSS_WARNING_TYPE env.
2023-07-06 17:47:13.382 64a73661.1 [unknown] 139657269428480 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: [Alarm Module]Host Name: pg-node02
2023-07-06 17:47:13.382 64a73661.1 [unknown] 139657269428480 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: [Alarm Module]Host IP: pg-node02. Copy hostname directly in case of taking 10s to use 'gethostbyname' when /etc/hosts does not contain <HOST IP>
2023-07-06 17:47:13.382 64a73661.1 [unknown] 139657269428480 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: [Alarm Module]Cluster Name: dbCluster
2023-07-06 17:47:13.382 64a73661.1 [unknown] 139657269428480 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 58
2023-07-06 17:47:13.387 64a73661.1 [unknown] 139657269428480 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: loaded library "security_plugin"
2023-07-06 17:47:13.428 64a73661.1 [unknown] 139657269428480 [unknown] 0 dn_6001_6002 01000 0 [BACKEND] WARNING: could not create any HA TCP/IP sockets
2023-07-06 17:47:13.432 64a73661.1 [unknown] 139657269428480 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: InitNuma numaNodeNum: 1 numa_distribute_mode: none inheritThreadPool: 0.
2023-07-06 17:47:13.432 64a73661.1 [unknown] 139657269428480 [unknown] 0 dn_6001_6002 01000 0 [BACKEND] WARNING: Failed to initialize the memory protect for g_instance.attr.attr_storage.cstore_buffers (1024 Mbytes) or shared memory (4470 Mbytes) is larger.
2023-07-06 17:47:13.525 64a73661.1 [unknown] 139657269428480 [unknown] 0 dn_6001_6002 00000 0 [CACHE] LOG: set data cache size(805306368)
2023-07-06 17:47:13.915 64a73661.1 [unknown] 139657269428480 [unknown] 0 dn_6001_6002 00000 0 [SEGMENT_PAGE] LOG: Segment-page constants: DF_MAP_SIZE: 8156, DF_MAP_BIT_CNT: 65248, DF_MAP_GROUP_EXTENTS: 4175872, IPBLOCK_SIZE: 8168, EXTENTS_PER_IPBLOCK: 1021, IPBLOCK_GROUP_SIZE: 4090, BMT_HEADER_LEVEL0_TOTAL_PAGES: 8323072, BktMapEntryNumberPerBlock: 2038, BktMapBlockNumber: 25, BktBitMaxMapCnt: 512
2023-07-06 17:47:13.973 64a73661.1 [unknown] 139657269428480 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: gaussdb: fsync file "/opt/software/openGauss5.0/install/data/dn/gaussdb.state.temp" success
2023-07-06 17:47:13.974 64a73661.1 [unknown] 139657269428480 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: create gaussdb state file success: db state(STARTING_STATE), server mode(Standby), connection index(1)
2023-07-06 17:47:14.001 64a73661.1 [unknown] 139657269428480 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: max_safe_fds = 974, usable_fds = 1000, already_open = 16
.
[2023-07-06 17:47:15.221][20533][][gs_ctl]: done
[2023-07-06 17:47:15.221][20533][][gs_ctl]: server started (/opt/software/openGauss5.0/install/data/dn)
[omm@pg-node02 ~]$
###🟢 3、在pg-node01主节点进行集群状态检查,----集群恢复成功
[omm@pg-node01 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Normal
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip port instance state
---------------------------------------------------------------------------------------------------------------
1 pg-node01 192.168.181.11 15600 6001 /opt/software/openGauss5.0/install/data/dn P Primary Normal
2 pg-node02 192.168.181.12 15600 6002 /opt/software/openGauss5.0/install/data/dn S Standby Normal
[omm@pg-node01 ~]$
###🟢 4、保存数据库主备机器信息
[omm@pg-node01 ~]$ gs_om -t refreshconf
Generating dynamic configuration file for all nodes.
Successfully generated dynamic configuration file.
[omm@pg-node01 ~]$
🔰 1.6 模拟双主
按照如下步骤恢复成正常的主备状态。避免造成数据丢失。
###🟢 1、执行以下命令,以primary模式启动备节点pg-node02。
[omm@pg-node02 ~]$ gs_ctl start -D /opt/software/openGauss5.0/install/data/dn -M primary
[omm@pg-node02 ~]$
###🟢 2、在pg-node01主节点进行集群状态检查---pg-node01、pg-node02均为主节点--》这种状态为异常状态
[omm@pg-node01 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Unavailable
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip port instance state
---------------------------------------------------------------------------------------------------------------
1 pg-node01 192.168.181.11 15600 6001 /opt/software/openGauss5.0/install/data/dn P Primary Normal
2 pg-node02 192.168.181.12 15600 6002 /opt/software/openGauss5.0/install/data/dn S Primary Normal
[omm@pg-node01 ~]$
###🟢 3、确定降为备机的节点pg-node02,在pg-node02节点上执行如下命令关闭服务。
[omm@pg-node02 ~]$ gs_ctl stop -D /opt/software/openGauss5.0/install/data/dn
[2023-07-06 17:58:33.883][21761][][gs_ctl]: gs_ctl stopped ,datadir is /opt/software/openGauss5.0/install/data/dn
waiting for server to shut down..... done
server stopped
[omm@pg-node02 ~]$
###🟢 4、执行以下命令,以standby模式启动pg-node02备节点。
[omm@pg-node02 ~]$ gs_ctl start -D /opt/software/openGauss5.0/install/data/dn -M standby
###🟢 5、在pg-node01主节点进行集群状态检查,----pg-node02故障(Standby Need repair(WAL))
[omm@pg-node01 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Degraded
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip port instance state
---------------------------------------------------------------------------------------------------------------
1 pg-node01 192.168.181.11 15600 6001 /opt/software/openGauss5.0/install/data/dn P Primary Normal
2 pg-node02 192.168.181.12 15600 6002 /opt/software/openGauss5.0/install/data/dn S Standby Need repair(WAL)
[omm@pg-node01 ~]$
###🟢 6、通过gs_ctl build -D 命令对故障节点pg-node02进行重建
[omm@pg-node02 ~]$ gs_ctl build -b auto -D /opt/software/openGauss5.0/install/data/dn
###🟢 7、在pg-node01主节点进行集群状态检查---集群状态恢复成功
[omm@pg-node01 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Normal
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip port instance state
---------------------------------------------------------------------------------------------------------------
1 pg-node01 192.168.181.11 15600 6001 /opt/software/openGauss5.0/install/data/dn P Primary Normal
2 pg-node02 192.168.181.12 15600 6002 /opt/software/openGauss5.0/install/data/dn S Standby Normal
[omm@pg-node01 ~]$
###🟢 8、保存数据库主备机器信息
[omm@pg-node01 ~]$ gs_om -t refreshconf
Generating dynamic configuration file for all nodes.
Successfully generated dynamic configuration file.
[omm@pg-node01 ~]$
🔻 总结—温故知新
❓ 该章详细介绍和实现了openGauss5.0企业版一主一备集群运维。
❓ openGauss5.0企业版一主一备集群启停,主备切换。
❓ openGauss5.0企业版一主一备模拟主库宕机、模拟双主及集群状态恢复。
👈【上一篇】 |
💖The End💖 点点关注,收藏不迷路💖
| 【下一篇】👉 |