在上一篇深度干货 | 如何兼顾性能与可靠性?一文解析YashanDB主备高可用技术中,我们深入探讨了YashanDB高可用的架构设计原理和关键技术,本文将聚焦于实践操作,快速体验YashanDB的主备高可用能力。
概要
YashanDB提供了不同部署形态下故障自动切换的能力:一主一备环境中,可以基于外部仲裁OM实现主备自动切换;一主多备配置中,可以基于Raft协议实现主备自动切换。当主机异常时,触发超时时间后,备机可以快速完成角色切换,继续执行业务,业务中断时间在秒级水平。
本文将进行一主一备安装部署、体验YashanDB的备机同步延迟和两种自动切换能力。整体操作简单易上手,大家可前往YashanDB官网(download.yashandb.com)下载最新的个人版进行体验。
安装前准备
1 前提条件
- 获取YashanDB的安装包
- 准备三台服务器(有条件的可以准备四台服务器,OM部署到单独的服务器)
- 开启SSH服务
- 创建yashan用户及用户组
- 创建HOME目录和DATA目录
- 检查YashanDB所需端口是否被占用
- 准备测试工具:benchmarksql-5.0
- 时钟同步,确保测试结果的正确性
2 测试环境
服务器配置情况:
环境信息:
3 创建用户yashan
# useradd -d /home/yashan -m yashan
# passwd yashan
4 创建安装目录
HOME目录和DATA目录均规划在/data/yashan下,yashan用户需要对该目录拥有全部权限,可执行如下命令授权:
# cd /
# mkdir yashan_data
# mkdir yashan_home
# chmod -R 770 /data/yashan/yashan_data
# chmod -R 770 /data/yashan/yashan_hom
5 下载安装包并解压
从YashanDB的官网(download.yashandb.com)下载最新的个人版安装包并解压。
安装一主一备
1.生成安装配置文件:hosts.toml和yashandb.toml
[yashan@ob1 install]$ yasboot package se gen --cluster yashandb -u yashan -p yashan --ip 192.168.7.10,192.168.7.11 --port 22 --install-path /data1/yashan/yasdb_home --data-path /data1/yashan/yasdb_data --begin-port 1688 --node 2
hostid | group | node_type | node_name | listen_addr | replication_addr | data_path
-------------------------------------------------------------------------------------------------------------
host0001 | dbg1 | db | 1-1 | 192.168.7.10:1688 | 192.168.7.10:1689 | /data1/yashan/yasdb_data
----------+-------+-----------+-----------+-------------------+-------------------+--------------------------
host0002 | dbg1 | db | 1-2 | 192.168.7.11:1688 | 192.168.7.11:1689 | /data1/yashan/yasdb_data
----------+-------+-----------+-----------+-------------------+-------------------+--------------------------
Generate config success
2.调整配置文件:根据实际需要调整yashandb.toml配置文件中的安装参数,可在group级别设置YashanDB的所有建库参数,可在node级别设置YashanDB的所有配置参数。为了保证本次测试的稳定,redo文件、数据文件以及归档文件需要单独使用一块磁盘,需要调整文件的创建路径
[group.config]
REDO_FILE_NUM = 10
REDO_FILE_SIZE = "10G"
REDO_FILE_PATH = '/data2/yashan/redo'
[group.node.config]
ARCHIVE_LOCAL_DEST = '/home/yashan/archive'
3.执行安装:安装YashanDB的运行程序到其他服务器,并且启动运维服务进程yasom和yasagent
[yashan@ob1 install]$ yasboot package install -t hosts.toml -i yashandb-personal-23.1.1.100-linux-x86_64.tar.gz
checking install package...
install version: yashandb 23.1.1.100
host0001 100% [====================================================================] 3s
host0002 100% [====================================================================] 3s
update host to yasom...
4.部署集群
[yashan@ob1 install]$ yasboot cluster deploy -t yashandb.toml
type | uuid | name | hostid | index | status | return_code | progress | cost
------------------------------------------------------------------------------------------------------------
task | e3205df3e98645ed | DeployYasdbCluster | - | yashandb | SUCCESS | 0 | 100 | 174
------+------------------+--------------------+--------+----------+---------+-------------+----------+------
task completed, status: SUCCESS
5.设置sys用户密码:设置为yashandb_123
[yashan@ob1 install]$ yasboot cluster password set --new-password yashandb_123 --cluster yashandb
type | uuid | name | hostid | index | status | return_code | progress | cost
----------------------------------------------------------------------------------------------------------
task | 4e11fb328e1695ac | YasdbPasswordSet | - | yashandb | SUCCESS | 0 | 100 | 3
------+------------------+------------------+--------+----------+---------+-------------+----------+------
task completed, status: SUCCESS
6.安装后检查
检查整个集群的状态:
[yashan@ob1 install]$ yasboot cluster status --cluster yashandb --detail
hostid | node_type | nodeid | pid | instance_status | database_status | database_role | listen_address | data_path
-------------------------------------------------------------------------------------------------------------------------------------------------
host0001 | db | 1-1:1 | 69010 | open | normal | primary | 192.168.7.10:1688 | /data1/yashan/yasdb_data/db-1-1
----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------
host0002 | db | 1-2:2 | 86135 | open | normal | standby | 192.168.7.11:1688 | /data1/yashan/yasdb_data/db-1-2
----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------
检查主备的链接状态:
SQL> select DEST_ID, CONNECTION, PEER_ADDR, STATUS, DATABASE_MODE from v$archive_dest_status;
DEST_ID CONNECTION PEER_ADDR STATUS DATABASE_MODE
------- ----------------- ---------------------------------------------------------------- ----------------- -----------------
1 CONNECTED 192.168.7.11:1689 NORMAL OPEN
1 row fetched.
检测主备的同步情况:做一些简单的业务测试
7.配置参数调优
根据服务器的负载生成推荐参数
SQL> EXEC DBMS_PARAM.OPTIMIZE(NULL, NULL, 90, 90);
PL/SQL Succeed.
查看参数推荐报告
SQL> SELECT DBMS_PARAM.SHOW_RECOMMEND() FROM DUAL;
DBMS_PARAM.SHOW_RECO
----------------------------------------------------------------
********** Recommended Settings For HEAP Table ***********
+--------------------------------+-------------+-------------+---------+
| name | current | recommend | restart |
+--------------------------------+-------------+-------------+---------+
| DATA_BUFFER_SIZE | 64M | 272785M | True |
| VM_BUFFER_SIZE | 32M | 34823M | True |
| WORK_AREA_STACK_SIZE | 1024K | 2M | True |
| WORK_AREA_POOL_SIZE | 16M | 128M | True |
| WORK_AREA_HEAP_SIZE | 512K | 512K | True |
| SHARE_POOL_SIZE | 256M | 34823M | True |
| LARGE_POOL_SIZE | 128M | 2048M | True |
| MAX_PARALLEL_WORKERS | 32 | 372 | True |
| SCOL_DATA_BUFFER_SIZE | 128M | 128M | True |
| SCOL_DATA_PRELOADERS | 2 | 2 | True |
| COLUMNAR_WORK_AREA_HEAP_SIZE | 64M | 32M | True |
| COLUMNAR_VM_BUFFER_SIZE | 2G | 128M | True |
| COLUMNAR_BULK_SIZE | 1024 | 1024 | True |
| COMPRESSION | LZ4 | LZ4 | True |
| PQ_POOL_SIZE | 128M | 128M | True |
| MAX_SESSIONS | 1024 | 1024 | True |
| MAX_WORKERS | 0 | 0 | True |
| TAB_QUEUE_WINDOW_SIZE | 4 | 4 | True |
| BLOOM_FILTER_FACTOR | .3 | .3 | True |
| DEGREE_OF_PARALLEL | 1 | 1 | True |
| MMS_DATA_LOADERS | 4 | 8 | True |
| CHECKPOINT_INTERVAL | 100000 | 256M | False |
| CHECKPOINT_TIMEOUT | 300 | 60 | False |
| REDOFILE_IO_MODE | DSYNC | DSYNC | True |
| DATAFILE_IO_MODE | DEFAULT | DEFAULT | True |
| COMMIT_LOGGING | IMMEDIATE | IMMEDIATE | False |
| RECOVERY_PARALLELISM | 16 | 64 | True |
| REDO_BUFFER_SIZE | 64M | 64M | True |
+--------------------------------+-------------+-------------+---------+
| total memory | 346760M |
+--------------------------------+-------------+-------------+---------+
Note: You can execute 'DBMS_PARAM.APPLY_RECOMMEND()' to apply the recommend parameters.
After applying the parameters, you need to restart the database.
1 row fetched.
将参数写入配置文件
SQL> EXEC DBMS_PARAM.APPLY_RECOMMEND();
PL/SQL Succeed.
配置参数是实例级别,需要每个节点都执行该操作。
8.开启自动切换:设置FailoverThreshold为5,并且开启自动切换
[yashan@ob1 install]$ yasboot election config set -k FailoverThreshold -v 5 --cluster yashandb
group 1 execute Succeed
[yashan@ob1 install]$ yasboot election enable on -c yashandb
group 1 execute Succeed
[yashan@ob1 install]$ yasboot election config show --cluster yashandb
group 1
Protection Mode: MAXIMUM PROTECTION
Members:
[1-1:1] - Primary database
[1-2:2] - Physical standby database
Transport Lag: 0 seconds
Apply Lag: 0 seconds
Apply Rate: 2.73 MByte/s
Properties:
FailoverThreshold = 5
FailoverAutoReinstate = false
ZeroDataLossMode = true
Automatic Failover: Enabled in Zero Data Loss Mode
测试备机同步延迟8ms
1 测试方案
- 主机创建一张表:create table ha_test(time_col timestamp),往该表插入一条数据。
- 获取本地时间戳,用本地时间戳update该表的数据,并提交。持续执行该操作。
- 在备机上查询该表的数据,通过执行查询该表的时间戳与查询到表中的数据的时间戳做差值,这个时间差就是主备同步的延迟。(表中只有一条数据,所以执行update和select操作的时间可以忽略不计)
2 测试步骤
1.首先准备TPC-C压力测试(如何使用TPC-C压力测试可以参考YashanDB的官网,有详细的介绍)。
2.TPC-C配置为300仓128并发,在该配置下可以达到百万级别tpmC的压力测试,在这种压力业务场景下执行测试验证。
3.分别在主机和备机上执行测试脚本(总共做100次测试)。
4.根据脚本统计的数据,计算主备业务的时间差。
测试脚本:
#!/bin/bash
#主机执行update业务操作
# 修改100次
for ((i=1; i<=100; i++))
do
# 获取当前时间并格式化为数据库可接受的格式
current_time=$(date +'%Y-%m-%d %H:%M:%S.%3N')
echo "Current time is: $current_time"
# 修改表ha_test的数据
yasql ha_test/123@192.168.7.10:1688 -c "UPDATE ha_test SET time_col='$current_time';"
sleep 0.1
done
#!/bin/bash
# 备机执行查询操作
while true
do
# 获取当前时间
current_time=$(date +'%Y-%m-%d %H:%M:%S.%3N')
echo "Current time is: $current_time"
#查询表ha_test的时间列数据
yasql ha_test/123@192.168.7.11:1688 -c "select time_col from ha_test;"
done
3 测试结果
- 测试时的redo刷盘速度(查询V$REDOSTAT获知):235MB/s
- 备机查询延迟的平均值:8ms
从100次测试中选取5次数据如下:
测试仲裁自动切换,RTO<8S
RTO的计算方式:旧主机业务中断时间同新主机执行业务成功的时间差。
1 测试步骤
1.继续构造压力测试场景(使用TPC-C的压力测试),执行10分钟左右的压力业务。
2.检测主机业务的中断时间和新主机成功执行业务的时间。
3.分别在主机和备机上执行检测脚本。
4.kill主机进程,使主机的业务中断。
测试脚本:
#!/bin/bash
# 无限循环
while true
do
# 获取当前时间并格式化为数据库可接受的格式
current_time=$(date +'%Y-%m-%d %H:%M:%S.%3N')
# 打印当前时间
echo "Current time is: $current_time"
# 执行写操作
yasql ha_test/123@192.168.7.10:1688 -c "UPDATE ha_test SET time_col='$current_time';"
done
2 测试结果
执行测试前集群的状态:
[yashan@ob1 install]$ yasboot cluster status --cluster yashandb --detail
hostid | node_type | nodeid | pid | instance_status | database_status | database_role | listen_address | data_path
-------------------------------------------------------------------------------------------------------------------------------------------------
host0001 | db | 1-1:1 | 69010 | open | normal | primary | 192.168.7.10:1688 | /data1/yashan/yasdb_data/db-1-1
----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------
host0002 | db | 1-2:2 | 86135 | open | normal | standby | 192.168.7.11:1688 | /data1/yashan/yasdb_data/db-1-2
----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------
kill主机之后,集群的状态(备机已经变成了主机)
[yashan@ob1 sync_test]$ yasboot cluster status --cluster yashandb --detail
hostid | node_type | nodeid | pid | instance_status | database_status | database_role | listen_address | data_path
-------------------------------------------------------------------------------------------------------------------------------------------------
host0001 | db | 1-1:1 | off | - | - | - | 192.168.7.10:1688 | /data1/yashan/yasdb_data/db-1-1
----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------
host0002 | db | 1-2:2 | 86135 | open | normal | primary | 192.168.7.11:1688 | /data1/yashan/yasdb_data/db-1-2
----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------
旧主机业务中断的时间戳:
Current time is: 2024-03-19 15:45:38.464
SQL> UPDATE ha_test SET time_col='2024-03-19 15:45:38.464';
1 row affected.
Current time is: 2024-03-19 15:45:38.476
SQL> UPDATE ha_test SET time_col='2024-03-19 15:45:38.476';
YAS-00406 connection is closed
新主机执行业务成功的时间:
Current time is: 2024-03-19 15:45:46.204
SQL> UPDATE ha_test SET time_col='2024-03-19 15:45:46.204';
YAS-06010 the database is not in readwrite mode
Current time is: 2024-03-19 15:45:46.211
SQL> UPDATE ha_test SET time_col='2024-03-19 15:45:46.211';
1 row affected.
3 测试总结
- 心跳间隔配置:1s
- 检查超时时间配置:5s
- 当前的redo刷盘速度:237MB/s
- 业务中断时间:7.745s
- 故障转移时间:小于3s
部署一主两备,在线增加备机
1.恢复环境并关闭仲裁自动切换,仲裁自动切换仅使用于一主一备的环境配置
[yashan@ob1 yasdb_home]$ yasboot election enable off -c yashandb
group 1 execute Succeed
[yashan@ob1 yasdb_home]$ yasboot election config show --cluster yashandb
group 1
Protection Mode: MAXIMUM PROTECTION
Members:
[1-2:2] - Primary database
[1-1:1] - Physical standby database
Transport Lag: 0 seconds
Apply Lag: 0 seconds
Apply Rate: 391.00 MByte/s
Properties:
FailoverThreshold = 5
FailoverAutoReinstate = false
ZeroDataLossMode = true
Automatic Failover: DISABLED
[yashan@ob1 yasdb_home]$ yasboot cluster status --cluster yashandb --detail
hostid | node_type | nodeid | pid | instance_status | database_status | database_role | listen_address | data_path
-------------------------------------------------------------------------------------------------------------------------------------------------
host0001 | db | 1-1:1 | 14818 | open | normal | primary | 192.168.7.10:1688 | /data1/yashan/yasdb_data/db-1-1
----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------
host0002 | db | 1-2:2 | 86135 | open | normal | standby | 192.168.7.11:1688 | /data1/yashan/yasdb_data/db-1-2
----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------
2.生成配置文件:hosts_add.toml和yashandb_add.toml
[yashan@ob1 install]$ yasboot config node gen -c yashandb -u yashan -p yashan --ip 192.168.7.12 --port 22 --data-path /data1/yashan/yasdb_data --install-path /data1/yashan/yasdb_home -g 1 --node 1
hostid | group | node_type | node_name | listen_addr | replication_addr | data_path
-------------------------------------------------------------------------------------------------------------
host0003 | dbg1 | db | 1-3 | 192.168.7.12:1688 | 192.168.7.12:1689 | /data1/yashan/yasdb_data
----------+-------+-----------+-----------+-------------------+-------------------+--------------------------
Generate config success
3.执行安装:安装YashanDB的运行程序到新增节点的服务器,并且启动服务进程yasagent
[yashan@ob1 install]$ yasboot host add -c yashandb -i yashandb-personal-23.1.1.100-linux-x86_64.tar.gz -t hosts_add.toml
type | uuid | name | hostid | index | status | return_code | progress | cost
-------------------------------------------------------------------------------------------------
task | 63112e698b5689a0 | HostAdd | - | yashandb | SUCCESS | 0 | 100 | 8
------+------------------+---------+--------+----------+---------+-------------+----------+------
task completed, status: SUCCESS
4.增加备机:任务显示成功并不代表着扩容任务成功,因为仍有后台任务在完成数据的同步等操作
[yashan@ob1 install]$ yasboot node add -c yashandb -t yashandb_add.toml
type | uuid | name | hostid | index | status | return_code | progress | cost
-------------------------------------------------------------------------------------------------
task | 4618495ddc9c012c | NodeAdd | - | yashandb | SUCCESS | 0 | 100 | 10
------+------------------+---------+--------+----------+---------+-------------+----------+------
task completed, status: SUCCESS
5.等待扩容任务完成
[yashan@ob1 install]$ yasboot task list -c yashandb --search type=NodeAdd
uuid | name | type | index | hostid | status | ret_code | progress | created_at | cost
-------------------------------------------------------------------------------------------------------------------------------------------------
ecff3c2c4b452ce1 | AddDBAlterHA | NodeAdd | yashandb | - | SUCCESS | 0 | 100 | 2024-03-19 16:04:36 | 1
------------------+-----------------------------+---------+--------------+----------+---------+----------+----------+---------------------+------
8d8146ab5fff3423 | BuildDatabaseToMultiAddress | NodeAdd | yashandb.1-1 | host0001 | SUCCESS | 0 | 100 | 2024-03-19 16:04:36 | 760
------------------+-----------------------------+---------+--------------+----------+---------+----------+----------+---------------------+------
4618495ddc9c012c | NodeAdd | NodeAdd | yashandb | - | SUCCESS | 0 | 100 | 2024-03-19 16:04:36 | 10
------------------+-----------------------------+---------+--------------+----------+---------+----------+----------+---------------------+------
6.安装后检查:
检测集群的状态
[yashan@ob1 install]$ yasboot cluster status --cluster yashandb --detail
hostid | node_type | nodeid | pid | instance_status | database_status | database_role | listen_address | data_path
-------------------------------------------------------------------------------------------------------------------------------------------------
host0001 | db | 1-1:1 | 14818 | open | normal | primary | 192.168.7.10:1688 | /data1/yashan/yasdb_data/db-1-1
----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------
host0002 | db | 1-2:2 | 86135 | open | normal | standby | 192.168.7.11:1688 | /data1/yashan/yasdb_data/db-1-2
----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------
host0003 | db | 1-3:3 | 14944 | open | normal | standby | 192.168.7.12:1688 | /data1/yashan/yasdb_data/db-1-3
----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------
主备连接状态检查
SQL> select DEST_ID, CONNECTION, PEER_ADDR, STATUS, DATABASE_MODE from v$archive_dest_status;
DEST_ID CONNECTION PEER_ADDR STATUS DATABASE_MODE
------- ----------------- ---------------------------------------------------------------- ----------------- -----------------
1 CONNECTED 192.168.7.11:1689 NORMAL OPEN
2 CONNECTED 192.168.7.12:1689 NORMAL OPEN
2 rows fetched.
7.开启Raft自动切换
[yashan@ob1 install]$ yasboot cluster config set -c yashandb -k HA_ELECTION_ENABLED -v true
type | uuid | name | hostid | index | status | return_code | progress | cost
--------------------------------------------------------------------------------------------------------------
task | cc2a1364200f86e8 | YasdbConfigSetParent | - | yashandb | SUCCESS | 0 | 100 | 1
------+------------------+----------------------+--------+----------+---------+-------------+----------+------
task completed, status: SUCCESS
可关注YashanDB视频号观看教程
测试Raft的自动切换,RTO<8S
1 测试步骤
测试步骤跟仲裁切换是一致的,这里不再介绍。
2 测试结果
执行测试前集群的状态:
[yashan@ob1 install]$ yasboot cluster status --cluster yashandb --detail
hostid | node_type | nodeid | pid | instance_status | database_status | database_role | listen_address | data_path
-------------------------------------------------------------------------------------------------------------------------------------------------
host0001 | db | 1-1:1 | 14818 | open | normal | primary | 192.168.7.10:1688 | /data1/yashan/yasdb_data/db-1-1
----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------
host0002 | db | 1-2:2 | 86135 | open | normal | standby | 192.168.7.11:1688 | /data1/yashan/yasdb_data/db-1-2
----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------
host0003 | db | 1-3:3 | 14944 | open | normal | standby | 192.168.7.12:1688 | /data1/yashan/yasdb_data/db-1-3
----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------
kill主机之后,集群的状态(备机已经变成了主机)
[yashan@ob1 install]$ yasboot cluster status --cluster yashandb --detail
hostid | node_type | nodeid | pid | instance_status | database_status | database_role | listen_address | data_path
-------------------------------------------------------------------------------------------------------------------------------------------------
host0001 | db | 1-1:1 | off | - | - | - | 192.168.7.10:1688 | /data1/yashan/yasdb_data/db-1-1
----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------
host0002 | db | 1-2:2 | 86135 | open | normal | primary | 192.168.7.11:1688 | /data1/yashan/yasdb_data/db-1-2
----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------
host0003 | db | 1-3:3 | 14944 | open | normal | standby | 192.168.7.12:1688 | /data1/yashan/yasdb_data/db-1-3
----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------
旧主机业务中断的时间戳:
Current time is: 2024-03-19 16:31:45.309
SQL> UPDATE ha_test SET time_col='2024-03-19 16:31:45.309';
1 row affected.
Current time is: 2024-03-19 16:31:45.322
SQL> UPDATE ha_test SET time_col='2024-03-19 16:31:45.322';
YAS-00406 connection is closed
新主机执行业务成功的时间:
Current time is: 2024-03-19 16:31:53.250
SQL> UPDATE ha_test SET time_col='2024-03-19 16:31:53.250';
YAS-06010 the database is not in readwrite mode
Current time is: 2024-03-19 16:31:53.257
SQL> UPDATE ha_test SET time_col='2024-03-19 16:31:53.257';
1 row affected.
3 测试总结
-
心跳间隔配置:1s
-
检查超时时间配置:5s
-
当前的redo刷盘速度:237MB/s
-
业务中断时间:7.935s
-
故障转移时间:小于3s
总结
本文向大家演示了YashanDB主备高可用部署以及两种自动切换的部署实施过程。可以看到,YashanDB在备机同步延迟、自动切换等方面展现了显著的优势,也欢迎大家前往官网下载体验(YashanDB官网) , 如果大家在实施过程中遇到任何的问题,欢迎加入YashanDB技术交流群反馈,我们的技术专家将为您提供解答与支持!