使用 ANTMAN 工具替换 OceanBase 云平台节点

news2024/12/29 11:07:14

OceanBase 环境基本都会先安装 OCP 来部署、监控、运维数据库集群。但如果有机器过保等问题,就需要有平稳的 OCP 节点的替换方案。

作者:张瑞远

上海某公司 DBA,曾经从事银行、证券数仓设计、开发、优化类工作,现主要从事电信级 IT 系统及数据库工作。有三年以上 OceanBase 工作经验。获得的专业技能与认证包括 OceanBase OBCP、Oracle OCP 11g、OracleOCM 11g 、MySQL OCP 5.7。

本文来源:原创投稿

  • 爱可生开源社区出品,原创内容未经授权不得随意使用,转载请联系小编并注明来源。

前言

OceanBase 云平台(OceanBase Cloud Platform,OCP),是以 OceanBase 为核心的企业级数据库管理平台。

我们生产环境基本都是需要先创建 OCP 平台,然后依赖 OCP 去创建及管理监控生产集群,所以安装 OCP 一般是系统上线的第一步。之后可能随着机房规划等问题,就会有需要搬迁或者替换 OCP 的机器的需求。

分别介绍两种 OCP 节点的替换方法。一种是使用 OAT 平台(OceanBase Admin Toolkit,管理者工具)来替换;另一种就是使用 ANTMAN 工具替换。上次我们介绍了第一种 OAT 的方案,本文介绍第二种。

PS:我的环境的 OCP 负载均衡使用的 F5,所以新的机器需要先配置 F5,其他负载均衡场景同理。

环境背景

大家如果有接触 OB 生产环境的经验的话,可能会了解,前期版本在安装 OCP 的时候,需要安装 OCP 软件/metadb/obproxy 三个 Docker 包,后期 OCP 版本将 DB+Proxy 集成在了一个 Docker 包里,OAT 的话只能纳管 DB + Proxy。集成的 metadb,分开的情况还需要使用 ANTMAN 工具来替换。

软件信息

软件版本
OCPocp-all-in-one:3.3.3-20220906114643
metadb+proxyOB2277_OBP320_x86_20220429
Proxy4.1.1_20230519_x86
antmant-oceanbase-antman-1.4.3-20220807073355.alios7.x86_64

操作过程

3.1 环境检查/准备

检查替换机器环境,包括分盘,创建 admin 用户,安装 Docker 软件等。安装好后检查下。

cd /root/t-oceanbase-antman/clonescripts/
sh precheck.sh -m ocp

登录 meta 库检查有没有 tenant 的主 ZONE 在要被替换的节点,提前切主。

MySQL [oceanbase]> select * from __all_Server;
+----------------------------+----------------------------+--------------+----------+----+----------------+------------+-----------------+--------+-----------------------+--------------------------------------------------------------------------------------+-----------+--------------------+--------------+----------------+-------------------+
| gmt_create                 | gmt_modified               | svr_ip       | svr_port | id | zone           | inner_port | with_rootserver | status | block_migrate_in_time | build_version                                                                        | stop_time | start_service_time | first_sessid | with_partition | last_offline_time |
+----------------------------+----------------------------+--------------+----------+----+----------------+------------+-----------------+--------+-----------------------+--------------------------------------------------------------------------------------+-----------+--------------------+--------------+----------------+-------------------+
| 2022-03-17 22:59:19.979627 | 2023-03-20 10:27:01.147283 | 10.10.100.87 |     2882 |  6 | META_OB_ZONE_2 |       2881 |               0 | active |                     0 | 2.2.76_20210406232249-a1e144bdc179fbf473cea37f199e8a76c736b8d4(Apr  6 2021 23:55:12) |         0 |   1679279220991796 |            0 |              1 |  1679278517144838 |
| 2022-03-17 23:54:49.277939 | 2023-03-20 09:29:22.079578 | 10.10.100.9  |     2882 |  7 | META_OB_ZONE_1 |       2881 |               1 | active |                     0 | 2.2.76_20210406232249-a1e144bdc179fbf473cea37f199e8a76c736b8d4(Apr  6 2021 23:55:12) |         0 |   1679275725595691 |            0 |              1 |                 0 |
| 2021-12-21 22:44:16.476503 | 2023-03-20 09:29:22.080425 | 122.44.11.2  |     2882 |  5 | META_OB_ZONE_3 |       2881 |               0 | active |                     0 | 2.2.76_20210406232249-a1e144bdc179fbf473cea37f199e8a76c736b8d4(Apr  6 2021 23:55:12) |         0 |   1640097866698859 |            0 |              1 |                 0 |
+----------------------------+----------------------------+--------------+----------+----+----------------+------------+-----------------+--------+-----------------------+--------------------------------------------------------------------------------------+-----------+--------------------+--------------+----------------+-------------------+
MySQL [oceanbase]> select tenant_name,primary_zone  from __all_tenant;
+----------------+----------------------------------------------+
| tenant_name    | primary_zone                                 |
+----------------+----------------------------------------------+
| sys            | META_OB_ZONE_1;META_OB_ZONE_3;META_OB_ZONE_2 |
| ocp_meta       | META_OB_ZONE_1;META_OB_ZONE_3;META_OB_ZONE_2 |
| ocp_monitor    | META_OB_ZONE_1;META_OB_ZONE_3;META_OB_ZONE_2 |
| oms_tt_tenant  | META_OB_ZONE_1;META_OB_ZONE_3;META_OB_ZONE_2 |
| oms_cc7_tenant | META_OB_ZONE_3;META_OB_ZONE_2;META_OB_ZONE_1 |
| oms_ff9_tenant | META_OB_ZONE_1;META_OB_ZONE_2,META_OB_ZONE_3 |
| oms_cc9_tenant | META_OB_ZONE_3;META_OB_ZONE_1,META_OB_ZONE_2 |
| oms_dd_tenant  | META_OB_ZONE_3;META_OB_ZONE_1,META_OB_ZONE_2 |
| obdw_meta      | META_OB_ZONE_3;META_OB_ZONE_1,META_OB_ZONE_2 |
+----------------+----------------------------------------------+
MySQL [oceanbase]> alter tenant sys primary_zone='META_OB_ZONE_2;META_OB_ZONE_3,META_OB_ZONE_1';
Query OK, 0 rows affected (0.04 sec)

MySQL [oceanbase]> alter tenant ocp_meta primary_zone='META_OB_ZONE_2;META_OB_ZONE_3,META_OB_ZONE_1';
Query OK, 0 rows affected (1.27 sec)

MySQL [oceanbase]> alter tenant ocp_monitor primary_zone='META_OB_ZONE_2;META_OB_ZONE_3,META_OB_ZONE_1';
Query OK, 0 rows affected (0.02 sec)

MySQL [oceanbase]> alter tenant oms_tt_tenant primary_zone='META_OB_ZONE_2;META_OB_ZONE_3,META_OB_ZONE_1';
Query OK, 0 rows affected (0.03 sec)

MySQL [oceanbase]> alter tenant oms_ff9_tenant primary_zone='META_OB_ZONE_2;META_OB_ZONE_3,META_OB_ZONE_1';
Query OK, 0 rows affected (0.03 sec)

因为使用 ANTMAN 工具迁移,需要在执行机器上修改 obcluster.conf 文件,或者直接从原 OCP 上 copy 后检查下,镜像包也需要传到该机器 /root/t-oceanbase-antman 目录下。

55obffocp:~/t-oceanbase-antman # cat obcluster.conf
ZONE1_RS_IP=10.10.100.9
ZONE2_RS_IP=10.10.100.87
ZONE3_RS_IP=122.44.11.2

######  自动配置,无需修改 / AUTO-CONFIGURATION ######

OBSERVER01_HOSTNAME=OCP_META_SERVER_1
OBSERVER02_HOSTNAME=OCP_META_SERVER_2
OBSERVER03_HOSTNAME=OCP_META_SERVER_3
ZONE1_NAME=META_OB_ZONE_1                       --后续命令参数,主要和参数文件中对上
ZONE2_NAME=META_OB_ZONE_2
ZONE3_NAME=META_OB_ZONE_3
##there must be more than half zone within same region
ZONE1_REGION=OCP_META_REGION
ZONE2_REGION=OCP_META_REGION
ZONE3_REGION=OCP_META_REGION
MYSQL_PORT=2881
RPC_PORT=2882

OCP_VERSION=3.3.3

检查执行 ANTMAN 脚本机器上默认集群密码是否正确。cd ~/t-oceanbase-antman/tools,执行 getpass.sh 的脚本,如果不对需要使用 setpass.sh 修改,因为后续 Proxy 的 Docker 迁移后会有验证,OCP 的 Docker 迁移前也会验证。

55obffocp:~/t-oceanbase-antman/tools # bash setpass.sh -s 0Aa255yK^F  
password file sys in /root/.key already exist!
**********************
Password of root@sys is  CqVgg9}Aut
Password of root@ocp_meta is  r6kS^EINTU
Password of root@ocp_monitor is  pkJv1a{7J7
Password of root@odc is  j{fjdd3X9f
Password of root@oms is  {oOIsE9fdQ

55obffocp:~ # mv .key  .key_bak
55obffocp:~ # cd /root/t-oceanbase-antman/tools/
55obffocp:~/t-oceanbase-antman/tools # bash setpass.sh -s 0Aa255yK^F
**********************
Password of root@sys is  0Aa255yK^F
Password of root@ocp_meta is 
Password of root@ocp_monitor is 
Password of root@odc is 
Password of root@oms is 
55obffocp:~/t-oceanbase-antman/tools # bash setpass.sh -c rSf@jO%6EO
**********************
Password of root@sys is  0Aa255yK^F
Password of root@ocp_meta is  rSf@jO%6EO
Password of root@ocp_monitor is 
Password of root@odc is 
Password of root@oms is

3.2 添加新机器

执行 ANTMAN 的 manage 脚本进行新机器的添加。

PS:这个版本 manage 会有报错,文末会有分享。

55obffocp:~/t-oceanbase-antman # ./manage.sh -i ob,ocp,obproxy -l 133.55.22.19 -z 1 -R Jnydzycscc@123 -A OceanBase#123
[2023-06-16 16:31:45.375633] INFO [check conf file /root/t-oceanbase-antman/obcluster.conf format ...]
[2023-06-16 16:31:45.381844] INFO [conf file is upper case format.]
[2023-06-16 16:31:45.391446] INFO [SSH_AUTH=password SSH_USER=root SSH_PORT=22 SSH_PASSWORD= SSH_KEY_FILE=/root/.ssh/id_rsa]
LB_MODE=f5
INSTALL_COMPONENTS componets: ob obproxy ocp
CLEAR_COMPONENTS: 
IP_LIST: 133.55.22.19
ZONE_LIST: 1
ROOT_PASSWORD_LIST: Jnydzycscc@123
ADMIN_PASSWORD_LIST: OceanBase#123
[2023-06-16 16:31:45.746503] INFO [INSTALL_COMPONENT: ob START  ######################################]
[2023-06-16 16:31:45.751057] INFO [deploy_ob: check whether OBSERVER port 2881,2882 are in use or not on 133.55.22.19]
[2023-06-16 16:31:45.806500] INFO [deploy_ob: OBSERVER port 2881,2882 are idle on 133.55.22.19]
[2023-06-16 16:31:45.810773] INFO [deploy_ob: installing ob cluster, logfile: /root/t-oceanbase-antman/logs/deploy_ob.log]
cp: '/root/t-oceanbase-antman/OB2276_x86_20210409.tar.gz' and '/root/t-oceanbase-antman/OB2276_x86_20210409.tar.gz' are the same file
skip copy same file
cp: '/root/t-oceanbase-antman/install_OB_docker.sh' and '/root/t-oceanbase-antman/install_OB_docker.sh' are the same file
skip copy same file
cp: '/root/t-oceanbase-antman/obcluster.conf' and '/root/t-oceanbase-antman/obcluster.conf' are the same file
skip copy same file
cp: '/root/t-oceanbase-antman/common/utils.sh' and '/root/t-oceanbase-antman/common/utils.sh' are the same file
skip copy same file
cp: '/root/.key' and '/root/.key' are the same file
skip copy same file
nohup: ignoring input
[2023-06-16 16:31:45.841348] INFO [installing OB docker and starting OB server on 133.55.22.19, pid: 144513, log: /root/t-oceanbase-antman/logs/install_OB_docker.log and /home/admin/logs/ob-server/ inside docker]
[2023-06-16 16:31:45.925592] INFO [load docker image: docker load -i /root/t-oceanbase-antman/OB2276_x86_20210409.tar.gz]
[2023-06-16 16:31:45.930723] INFO [install_OB_docker.sh is still running on 133.55.22.19]
[2023-06-16 16:31:56.021465] INFO [install_OB_docker.sh is still running on 133.55.22.19]
Loaded image: reg.docker.alibaba-inc.com/antman/ob-docker:OB2276_x86_20210409
[2023-06-16 16:32:06.111458] INFO [install_OB_docker.sh is still running on 133.55.22.19]
[2023-06-16 16:32:06.359285] INFO [start container: docker run -d -it --cap-add SYS_RESOURCE --name META_OB_ZONE_1 --net=host     -e OBCLUSTER_NAME=obcluster      -e DEV_NAME=bond0     -e ROOTSERVICE_LIST="10.10.100.9:2882:2881;10.10.100.87:2882:2881;122.44.11.2:2882:2881"     -e DATAFILE_DISK_PERCENTAGE=90     -e CLUSTER_ID=1632654636     -e ZONE_NAME=META_OB_ZONE_1     -e OBPROXY_PORT=2883     -e MYSQL_PORT=2881     -e RPC_PORT=2882     -e OCP_VIP=134.80.173.57     -e OCP_VPORT=80     -e app.password_root='Jnydzycscc@123'     -e app.password_admin='OceanBase#123'     -e OBPROXY_OPTSTR=""     -e OPTSTR="cpu_count=64,system_memory=50G,memory_limit=254G,__min_full_resource_pool_memory=1073741824,_ob_enable_prepared_statement=false,memory_limit_percentage=90"     --cpu-period 100000     --cpu-quota 6400000     --cpuset-cpus 0-63     --memory 256G     -v /home/admin/oceanbase:/home/admin/oceanbase     -v /data/log1:/data/log1     -v /data/1:/data/1     --restart on-failure:5     reg.docker.alibaba-inc.com/antman/ob-docker:OB2276_x86_20210409]
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
4f1c15e8194cc1ae2fffcc124ea9c982b3fda87ce1a6d0038db88435c737af89
[2023-06-16 16:32:16.209761] INFO [install_OB_docker.sh finished and reg.docker.alibaba-inc.com/antman/ob-docker:OB2276_x86_20210409 started on 133.55.22.19]
[2023-06-16 16:32:16.214771] INFO [waiting on observer ready on 133.55.22.19]
[2023-06-16 16:35:16.244133] INFO [waiting on observer ready on 133.55.22.19 for 3 Minitues]
[2023-06-16 16:36:16.264776] INFO [waiting on observer ready on 133.55.22.19 for 4 Minitues]
[2023-06-16 16:37:16.285808] INFO [waiting on observer ready on 133.55.22.19 for 5 Minitues]
[2023-06-16 16:37:16.579057] INFO [observer on 133.55.22.19 is ready]
[2023-06-16 16:37:16.584583] INFO [deploy_ob: installation of ob cluster done]
[2023-06-16 16:37:16.588617] INFO [INSTALL_COMPONENT: ob DONE  ######################################]
[2023-06-16 16:37:16.593604] INFO [INSTALL_COMPONENT: obproxy START  ######################################]

这里日志太多,就不都粘贴出来了,可以从上面看到 metadb 的 Docker 服务添加完后开始了 OBProxy 的 Docker 服务添加。

过程说明

  1. 133.55.22.19 是要去替换 OCP 的服务器的实际物理 IP。

  2. -z 1 选项,指定的 133.55.22.19 会被添加到 OCP 环境中的第 1 个 ZONE,即和上文查到的 10.10.100.9 机器在同一个 ZONE 里。

    这里关于 ZONE 的定义,主要是针对 OCP 服务器上的 meta_ob docker 而言,obproxy dockerocp docker 并没有 ZONE 的概念。

    关于每台 OCP 服务器上的 meta_ob docker 属于哪一个 ZONE,请参考 obcluster.conf 配置文件中的三个变量:

    • ZONE1_RS_IP
    • ZONE2_RS_IP
    • ZONE3_RS_IP
  3. -R-A,后面需要分别填写成 133.55.22.19 服务器的 root 用户密码和 admin 用户密码。

  4. -i 是安装,如果替换成 -c 就是清除。

这时候正常的话可以通过新添加节点的 IP:8080 前台登录 OCP,也可以通过这个机器的 2883 端口去连 meta 库了。

3.3 新增租户

登录 OCP 的 metadb 的 sys 租户新增 meta_ob Docker 的上线。

MySQL [oceanbase]> alter system add server '133.55.22.19:2882' zone 'META_OB_ZONE_1';
Query OK, 0 rows affected (0.02 sec)

MySQL [oceanbase]> select svr_ip, zone, with_rootserver, status, start_service_time from __all_server;
+--------------+----------------+-----------------+--------+--------------------+
| svr_ip       | zone           | with_rootserver | status | start_service_time |
+--------------+----------------+-----------------+--------+--------------------+
| 10.10.100.87 | META_OB_ZONE_2 |               1 | active |   1679279220991796 |
| 10.10.100.9  | META_OB_ZONE_1 |               0 | active |   1679275725595691 |
| 122.44.11.2  | META_OB_ZONE_3 |               0 | active |   1640097866698859 |
| 133.55.22.19 | META_OB_ZONE_1 |               0 | active |                  0 |
+--------------+----------------+-----------------+--------+--------------------+
4 rows in set (0.00 sec)

MySQL [oceanbase]> select svr_ip, zone, with_rootserver, status, start_service_time from __all_server;
+--------------+----------------+-----------------+--------+--------------------+
| svr_ip       | zone           | with_rootserver | status | start_service_time |
+--------------+----------------+-----------------+--------+--------------------+
| 10.10.100.87 | META_OB_ZONE_2 |               1 | active |   1679279220991796 |
| 10.10.100.9  | META_OB_ZONE_1 |               0 | active |   1679275725595691 |
| 122.44.11.2  | META_OB_ZONE_3 |               0 | active |   1640097866698859 |
| 133.55.22.19 | META_OB_ZONE_1 |               0 | active |   1686908200404755 |
+--------------+----------------+-----------------+--------+--------------------+
4 rows in set (0.01 sec)

3.4 替换下线

登录 OCP 的 metadb 的 sys 租户将被替换 meta_ob Docker 的下线。

MySQL [oceanbase]>  alter system delete server '10.10.100.9:2882' zone 'META_OB_ZONE_1';
Query OK, 0 rows affected (0.19 sec)

MySQL [oceanbase]> select svr_ip, zone, with_rootserver, status, start_service_time from __all_server;
+--------------+----------------+-----------------+----------+--------------------+
| svr_ip       | zone           | with_rootserver | status   | start_service_time |
+--------------+----------------+-----------------+----------+--------------------+
| 10.10.100.87 | META_OB_ZONE_2 |               1 | active   |   1679279220991796 |
| 10.10.100.9  | META_OB_ZONE_1 |               0 | deleting |   1679275725595691 |
| 122.44.11.2  | META_OB_ZONE_3 |               0 | active   |   1640097866698859 |
| 133.55.22.19 | META_OB_ZONE_1 |               0 | active   |   1686908200404755 |
+--------------+----------------+-----------------+----------+--------------------+
4 rows in set (0.01 sec)
MySQL [oceanbase]> select svr_ip, zone, with_rootserver, status, start_service_time from __all_server;
+--------------+----------------+-----------------+--------+--------------------+
| svr_ip       | zone           | with_rootserver | status | start_service_time |
+--------------+----------------+-----------------+--------+--------------------+
| 10.10.100.87 | META_OB_ZONE_2 |               1 | active |   1679279220991796 |
| 122.44.11.2  | META_OB_ZONE_3 |               0 | active |   1640097866698859 |
| 133.55.22.19 | META_OB_ZONE_1 |               0 | active |   1686908200404755 |
+--------------+----------------+-----------------+--------+--------------------+

3.5 更新服务器信息

登录 ocp_meta 租户,手工更新 OCP 服务器信息。

前面步骤处理完,OCP 前台还可以看到残留的信息,需要替换下信息。

55obffocp:~/t-oceanbase-antman # mysql -h10.10.100.87 -P2883 -uroot@ocp_meta#obcluster -p'rSf@jO%6EO' -Docp -c


MySQL [ocp]>  select * from compute_host where inner_ip_address='10.10.100.9'\G
*************************** 1. row ***************************
              id: 1
            name: ocp1a
     description: NULL
operating_system: 4.12.14-120-default
    architecture: x86_64
inner_ip_address: 10.10.100.9
        ssh_port: 2022
            kind: DEDICATED_PHYSICAL_MACHINE
   publish_ports: NULL
          status: ONLINE
          vpc_id: 1
          idc_id: 1
    host_type_id: 1
   serial_number: NULL
           alias: NULL
     create_time: 2021-09-26 21:04:11
     update_time: 2023-03-20 11:01:58
1 row in set (0.00 sec)

MySQL [ocp]> update compute_host set inner_ip_address='133.55.22.19', name='55obffocp' where inner_ip_address='10.10.100.9';
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

MySQL [ocp]> select * from compute_host where id =1;
+----+-----------+-------------+---------------------+--------------+------------------+----------+----------------------------+---------------+--------+--------+--------+--------------+---------------+-------+---------------------+---------------------+
| id | name      | description | operating_system    | architecture | inner_ip_address | ssh_port | kind                       | publish_ports | status | vpc_id | idc_id | host_type_id | serial_number | alias | create_time         | update_time         |
+----+-----------+-------------+---------------------+--------------+------------------+----------+----------------------------+---------------+--------+--------+--------+--------------+---------------+-------+---------------------+---------------------+
|  1 | 55obffocp | NULL        | 4.12.14-120-default | x86_64       | 133.55.22.19     |     2022 | DEDICATED_PHYSICAL_MACHINE | NULL          | ONLINE |      1 |      1 |            1 | NULL          | NULL  | 2021-09-26 21:04:11 | 2023-06-16 17:47:19 |
+----+-----------+-------------+---------------------+--------------+------------------+----------+----------------------------+---------------+--------+--------+--------+--------------+---------------+-------+---------------------+---------------------+
1 row in set (0.00 sec)

3.6 清理被替换机器上残留的服务

ocp1a:~/t-oceanbase-antman #  ./manage.sh -c ob,ocp,obproxy -l 10.10.100.9 -z 1 -R 'Dt!n(Rg4Av!t' -A OceanBase#123
grep: /etc/system-release: No such file or directory
[2023-06-16 22:45:44.101400] INFO [check conf file /root/t-oceanbase-antman/obcluster.conf format ...]
[2023-06-16 22:45:44.106779] INFO [conf file is upper case format.]
[2023-06-16 22:45:44.114437] INFO [SSH_AUTH=password SSH_USER=root SSH_PORT=22 SSH_PASSWORD= SSH_KEY_FILE=/root/.ssh/id_rsa]
LB_MODE=f5
INSTALL_COMPONENTS componets: 
CLEAR_COMPONENTS: ob obproxy ocp
IP_LIST: 10.10.100.9
ZONE_LIST: 1
ROOT_PASSWORD_LIST: Dt!n(Rg4Av!t
ADMIN_PASSWORD_LIST: OceanBase#123
[2023-06-16 22:45:44.474268] INFO [CLEAR_COMPONENT: ob START  ######################################]
cp: '/root/t-oceanbase-antman/uninstall.sh' and '/root/t-oceanbase-antman/uninstall.sh' are the same file
skip copy same file
cp: '/root/t-oceanbase-antman/obcluster.conf' and '/root/t-oceanbase-antman/obcluster.conf' are the same file
skip copy same file
cp: '/root/t-oceanbase-antman/common/utils.sh' and '/root/t-oceanbase-antman/common/utils.sh' are the same file
skip copy same file
grep: /etc/system-release: No such file or directory
[2023-06-16 22:45:44.504069] INFO [remove OB server and docker on host: 10.10.100.9]
[2023-06-16 22:45:44.548697] INFO [docker rm -f 62ab623cb4ed]
62ab623cb4ed
[2023-06-16 22:46:01.260706] INFO [remove OB server and docker on host: 10.10.100.9 done!]
[2023-06-16 22:46:01.370808] INFO [uninstall.sh ob  finished and reg.docker.alibaba-inc.com/antman/ob-docker:OB2276_x86_20210409 removed on 10.10.100.9]
[2023-06-16 22:46:01.375914] INFO [OB docker on 10.10.100.9 is removed]
[2023-06-16 22:46:01.380667] INFO [CLEAR_COMPONENT: ob DONE  ######################################]
[2023-06-16 22:46:01.385398] INFO [CLEAR_COMPONENT: obproxy START  ######################################]
cp: '/root/t-oceanbase-antman/uninstall.sh' and '/root/t-oceanbase-antman/uninstall.sh' are the same file
skip copy same file
cp: '/root/t-oceanbase-antman/obcluster.conf' and '/root/t-oceanbase-antman/obcluster.conf' are the same file
skip copy same file
cp: '/root/t-oceanbase-antman/common/utils.sh' and '/root/t-oceanbase-antman/common/utils.sh' are the same file
skip copy same file
grep: /etc/system-release: No such file or directory
[2023-06-16 22:46:01.416495] INFO [remove obproxy docker on host:10.10.100.9]
[2023-06-16 22:46:01.514215] INFO [docker rm -f 01bdcadf2e11]
01bdcadf2e11
[2023-06-16 22:46:01.765459] INFO [remove obproxy docker on host:10.10.100.9 done!]
[2023-06-16 22:46:01.858848] INFO [uninstall.sh obproxy  finished and reg.docker.alibaba-inc.com/antman/obproxy:OBP186_20210315 removed on 10.10.100.9]
[2023-06-16 22:46:01.863806] INFO [obproxy docker on 10.10.100.9 is removed]
[2023-06-16 22:46:01.868778] INFO [CLEAR_COMPONENT: obproxy DONE  ######################################]
[2023-06-16 22:46:01.873368] INFO [CLEAR_COMPONENT: ocp START  ######################################]
cp: '/root/t-oceanbase-antman/uninstall.sh' and '/root/t-oceanbase-antman/uninstall.sh' are the same file
skip copy same file
cp: '/root/t-oceanbase-antman/obcluster.conf' and '/root/t-oceanbase-antman/obcluster.conf' are the same file
skip copy same file
cp: '/root/t-oceanbase-antman/common/utils.sh' and '/root/t-oceanbase-antman/common/utils.sh' are the same file
skip copy same file
grep: /etc/system-release: No such file or directory
[2023-06-16 22:46:01.906934] INFO [remove ocp docker on host:10.10.100.9]
[2023-06-16 22:46:01.944811] INFO [docker rm -f 8b044744a92e]
8b044744a92e
[2023-06-16 22:46:26.162467] INFO [remove ocp docker on host:10.10.100.9 done]
[2023-06-16 22:46:26.253927] INFO [uninstall.sh ocp  finished and reg.docker.alibaba-inc.com/oceanbase/ocp-all-in-one:3.3.3-20220906114643 removed on 10.10.100.9]
[2023-06-16 22:46:26.258281] INFO [ocp docker on 10.10.100.9 is removed]
[2023-06-16 22:46:26.263047] INFO [CLEAR_COMPONENT: ocp DONE  ######################################]
ocp1a:~/t-oceanbase-antman # docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

服务已经被清除。

报错记录处理

manage 脚本执行报错。

55obffocp:~/t-oceanbase-antman # ./manage.sh -i ob,ocp,obproxy -l 133.55.22.19 -z 1 -R Jnydzycscc@123 -A OceanBase#123
[2023-06-16 16:31:03.305062] INFO [check conf file /root/t-oceanbase-antman/obcluster.conf format ...]
[2023-06-16 16:31:03.309079] INFO [conf file is upper case format.]
[2023-06-16 16:31:03.315290] INFO [SSH_AUTH=password SSH_USER=root SSH_PORT=22 SSH_PASSWORD= SSH_KEY_FILE=/root/.ssh/id_rsa]
LB_MODE=f5
INSTALL_COMPONENTS componets: ob obproxy ocp
CLEAR_COMPONENTS: 
IP_LIST: 133.55.22.19
ZONE_LIST: 1
ROOT_PASSWORD_LIST: Jnydzycscc@123
ADMIN_PASSWORD_LIST: OceanBase#123
/root/t-oceanbase-antman/common/utils.sh: line 484: -e: command not found
[2023-06-16 16:31:03.636862] ERROR [: ssh authorization to 133.55.22.19 failed, Please check SSH affinity environment varialbes.]

这个问题也需要修改脚本代码解决。

执行 alter system delete server 语句之后很久,被替换的 Server 没有 Delete 掉,一直是 deleting 状态,检查发现 OCP 的 meta 库内存参数调整过,新加的 Server 参数小,导致 UNIT 迁移卡住。

MySQL [oceanbase]> select svr_ip, zone, with_rootserver, status, start_service_time from __all_server;
+--------------+----------------+-----------------+----------+--------------------+
| svr_ip       | zone           | with_rootserver | status   | start_service_time |
+--------------+----------------+-----------------+----------+--------------------+
| 10.10.100.87 | META_OB_ZONE_2 |               1 | active   |   1679279220991796 |
| 10.10.100.9  | META_OB_ZONE_1 |               0 | deleting |   1679275725595691 |
| 122.44.11.2  | META_OB_ZONE_3 |               0 | active   |   1640097866698859 |
| 133.55.22.19 | META_OB_ZONE_1 |               0 | active   |   1686908200404755 |
+--------------+----------------+-----------------+----------+--------------------+
4 rows in set (0.01 sec)


MySQL [oceanbase]> select count(*),svr_ip from  gv$unit  group by svr_ip;
+----------+--------------+
| count(*) | svr_ip       |
+----------+--------------+
|       27 | 133.55.22.19 |
|       33 | 10.10.100.87 |
|       33 | 122.44.11.2  |
|        6 | 10.10.100.9  |
+----------+--------------+
4 rows in set (0.01 sec)
MySQL [oceanbase]> select *   from  gv$unit   where  svr_ip='10.10.100.9';
+---------+----------------+---------------------------------------------+------------------+----------------------------------------+----------------+-----------+-------------+-------------+----------+---------------------+-----------------------+---------+---------+-------------+-------------+----------+----------+---------------+-----------------+
| unit_id | unit_config_id | unit_config_name                            | resource_pool_id | resource_pool_name                     | zone           | tenant_id | tenant_name | svr_ip      | svr_port | migrate_from_svr_ip | migrate_from_svr_port | max_cpu | min_cpu | max_memory  | min_memory  | max_iops | min_iops | max_disk_size | max_session_num |
+---------+----------------+---------------------------------------------+------------------+----------------------------------------+----------------+-----------+-------------+-------------+----------+---------------------+-----------------------+---------+---------+-------------+-------------+----------+----------+---------------+-----------------+
|    1106 |           1090 | config_oms_tt_tenant_META_OB_ZONE_1_S2_gpa  |             1080 | pool_oms_tt_tenant_META_OB_ZONE_1_gpa  | META_OB_ZONE_1 |      NULL | NULL        | 10.10.100.9 |     2882 |                     |                     0 |       3 |       3 | 12884901888 | 12884901888 |     2500 |     2500 |  536870912000 |             750 |
|    1139 |           1094 | oms_unit                                    |             1129 | oms_ff9_tenant_resource_pool           | META_OB_ZONE_1 |      NULL | NULL        | 10.10.100.9 |     2882 |                     |                     0 |       2 |       2 |  5368709120 |  4294967296 |      128 |      128 |    5368709120 |           10000 |
|    1122 |           1097 | config_oms_c55_tenant_META_OB_ZONE_1_S1_ifu |             1088 | pool_oms_c55_tenant_META_OB_ZONE_1_ifu | META_OB_ZONE_1 |      NULL | NULL        | 10.10.100.9 |     2882 |                     |                     0 |     1.5 |     1.5 |  6442450944 |  6442450944 |     1250 |     1250 |  536870912000 |             375 |
|    1126 |           1100 | config_oms_ff6_tenant_META_OB_ZONE_1_S1_uzz |             1092 | pool_oms_ff6_tenant_META_OB_ZONE_1_uzz | META_OB_ZONE_1 |      NULL | NULL        | 10.10.100.9 |     2882 |                     |                     0 |     1.5 |     1.5 |  6442450944 |  6442450944 |     1250 |     1250 |  536870912000 |             375 |
|    1127 |           1101 | config_oms_ff7_tenant_META_OB_ZONE_1_S1_gkj |             1093 | pool_oms_ff7_tenant_META_OB_ZONE_1_gkj | META_OB_ZONE_1 |      NULL | NULL        | 10.10.100.9 |     2882 |                     |                     0 |     1.5 |     1.5 |  6442450944 |  6442450944 |     1250 |     1250 |  536870912000 |             375 |
|    1135 |           1108 | config_oms_cc8_tenant_META_OB_ZONE_1_S1_wwo |             1101 | pool_oms_cc8_tenant_META_OB_ZONE_1_wwo | META_OB_ZONE_1 |      NULL | NULL        | 10.10.100.9 |     2882 |                     |                     0 |     1.5 |     1.5 |  6442450944 |  6442450944 |     1250 |     1250 |  536870912000 |             375 |
+---------+----------------+---------------------------------------------+------------------+----------------------------------------+----------------+-----------+-------------+-------------+----------+---------------------+-----------------------+---------+---------+-------------+-------------+----------+----------+---------------+-----------------+
6 rows in set (0.02 sec)


MySQL [oceanbase]> alter  system migrate unit=1106 destination='133.55.22.19:2882';
ERROR 4624 (HY000): machine resource is not enough to hold a new unit              ----------------手动去迁移报资源不足

MySQL [oceanbase]> select zone,svr_ip, cpu_total, cpu_assigned,cpu_assigned_percent cpu_ass_pct, round(mem_total/1024/1024/1024) mem_total_gb,
    ->        round(mem_assigned/1024/1024/1024) mem_ass_gb, mem_assigned_percent mem_ass_pct, unit_num, migrating_unit_num, leader_count, round(`load`,2) `load`
    -> from __all_virtual_server_stat
    -> order by zone, svr_ip;                                ---------------------检查资源发现内存不足
+----------------+--------------+-----------+--------------+-------------+--------------+------------+-------------+----------+--------------------+--------------+------+
| zone           | svr_ip       | cpu_total | cpu_assigned | cpu_ass_pct | mem_total_gb | mem_ass_gb | mem_ass_pct | unit_num | migrating_unit_num | leader_count | load |
+----------------+--------------+-----------+--------------+-------------+--------------+------------+-------------+----------+--------------------+--------------+------+
| META_OB_ZONE_1 | 10.10.100.9  |        62 |           11 |          17 |          250 |         40 |          16 |        6 |                  0 |            0 | 0.17 |
| META_OB_ZONE_1 | 133.55.22.19 |        62 |           48 |          77 |          204 |        196 |          96 |       27 |                  0 |            0 | 0.87 |
| META_OB_ZONE_2 | 10.10.100.87 |        62 |           59 |          95 |          250 |        236 |          94 |       33 |                  0 |         2935 | 0.95 |
| META_OB_ZONE_3 | 122.44.11.2  |        62 |           59 |          95 |          250 |        236 |          94 |       33 |                  0 |         1051 | 0.95 |

MySQL [oceanbase]> show parameters  like  '%memory_limit%'
 ;
| META_OB_ZONE_3 | observer | 122.44.11.2  |     2882 | memory_limit                        | NULL      | 300G  | the size of the memory reserved for internal use(for testing purpose). Range: [0M,)                                                                                                                                                    | OBSERVER | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
| META_OB_ZONE_1 | observer | 133.55.22.19 |     2882 | memory_limit                        | NULL      | 254G  | the size of the memory reserved for internal use(for testing purpose). Range: [0M,)                                                                                                                                                    | OBSERVER | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
| META_OB_ZONE_2 | observer | 10.10.100.87 |     2882 | memory_limit                        | NULL      | 300G  | the size of the memory reserved for internal use(for testing purpose). Range: [0M,)                                                                                                                                                    | OBSERVER | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
| META_OB_ZONE_1 | observer | 10.10.100.9  |     2882 | memory_limit                        | NULL      | 300G  | the size of the memory reserved for internal use(for testing purpose). Range: [0M,)                                                                                                                                                    | OBSERVER | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |

MySQL [oceanbase]> alter   system  set   memory_limit ='300G'  ;
Query OK, 0 rows affected (0.05 sec)

MySQL [oceanbase]> select count(*),svr_ip from  gv$unit  group by svr_ip;
+----------+--------------+
| count(*) | svr_ip       |
+----------+--------------+
|       33 | 133.55.22.19 |
|       33 | 10.10.100.87 |
|       33 | 122.44.11.2  |
+----------+--------------+
3 rows in set (0.00 sec)


MySQL [oceanbase]> select svr_ip, zone, with_rootserver, status, start_service_time from __all_server;
+--------------+----------------+-----------------+--------+--------------------+
| svr_ip       | zone           | with_rootserver | status | start_service_time |
+--------------+----------------+-----------------+--------+--------------------+
| 10.10.100.87 | META_OB_ZONE_2 |               1 | active |   1679279220991796 |
| 122.44.11.2  | META_OB_ZONE_3 |               0 | active |   1640097866698859 |
| 133.55.22.19 | META_OB_ZONE_1 |               0 | active |   1686908200404755 |
+--------------+----------------+-----------------+--------+--------------------+
3 rows in set (0.00 sec)

总结

到此,使用 ANTMAN 工具的方式去替换 OCP 机器的操作就结束了,包括前面一篇使用 OAT 替换 OCP 节点的文章可能看起来没什么难度,但是整个过程来回做了好几遍,充满我的坎坷和泪水。为了别人以后少踩坑,所以写了这两篇文章分享。

如果看了上篇文章的话,应该知道 OAT 替换 OCP 的时候,新加机器是在 metadb 中新创建了一个 ZONE,然后再把被替换机器下掉。其中还涉及新建资源池修改 Locality,增加副本数等操作。其实使用 ANTMAN 工具的话这个步骤就不太一样,是将新机器加入到需要替换机器的同一个 ZONE 内,然后做同 ZONE 内 UNIT 的迁移,然后把被替换的机器下线。

现阶段的话,相对来说使用 ANTMAN 工具替换之后对于 OCP 元数据的影响更小一些,但是 OAT 黑屏操作更少已些。对于 OBProxy 单独 Docker 的前期场景必须使用 ANTMAN,后期版本就看大家自己酌情选择了。

行之所向,莫问远方。

关于 SQLE

爱可生开源社区的 SQLE 是一款面向数据库使用者和管理者,支持多场景审核,支持标准化上线流程,原生支持 MySQL 审核且数据库类型可扩展的 SQL 审核工具。

SQLE 获取

类型地址
版本库https://github.com/actiontech/sqle
文档https://actiontech.github.io/sqle-docs/
发布信息https://github.com/actiontech/sqle/releases
数据审核插件开发文档https://actiontech.github.io/sqle-docs-cn/3.modules/3.7_auditplugin/auditplugin_development.html

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/713333.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

旅游卡景区购票小程序开发定制

旅游业的蓬勃发展,越来越多的景区开始推出自己的旅游卡,以吸引更多的游客前来观光。同时,为了更加便捷地服务游客,许多景区也开始启用小程序来进行门票售卖和游客管理。针对这种情况,专业的小程序开发公司推出了定制旅…

机器学习-特征选择:如何使用相关性分析精确选择最佳特征?

一、引言 「特征选择」在机器学习中发挥着重要的作用,它的目标是从众多可用特征中挑选出最具预测能力的特征子集,以提高模型性能和泛化能力。然而,由于现实中的数据集通常具有大量特征和复杂的相关性,特征选择变得非常具有挑战性。…

[ 云计算 | AWS ] IAM 详解以及如何在 AWS 中直接创建 IAM 账号

本章节主要介绍 IAM 相关知识点以及在 AWS 控制台窗口如何创建一台 Amazon IAM 账号。 文章目录 一、什么是 IAM?二、IAM 常见种类2.1 EIAM2.2 CIAM2.3 云厂商 IAM 三、账号(Account)三户模型 四、认证(Authentication&#xff09…

java使用Tess4J实现OCR图片文字识别

目录 介绍一、maven如下二、下载语言模型1.下载语言模型2.百度云下载 三、测试1.测试代码2.测试图片3.效果 介绍 Tess4J 是 Tesseract OCR 的 java api 实现库,你可以通过 java 调用来轻松的实现图片识别并提取文字,也就是 OCR 图片提取文字技术。 Tes…

黑客是什么?想成为黑客需要学习什么?

什么是黑客 在《黑客辞典》里有不少关于“黑客”的定义, 大多和“精于技术”或“乐于解决问题并超越极限”之类的形容相关。然而,若你想知道如何成为一名黑客,只要牢记两点即可。 这是一个社区和一种共享文化,可追溯到那群数十年前使…

mybits相关知识点

这里写目录标题 入门第一个程序步骤配置sql,建立数据库连接 jdbc数据库连接池简介连接池的切换总结 lombok Mybatis基础操作(注解)准备工作类型对应 删除简介具体代码 预编译简介优点优点1优点2 预编译的实现总结 新增简介具体代码 新增&…

如何发布插件到npm

首先 你需要注册一个npm账号 npm 网址:https://www.npmjs.com/ 点击 Sign in 跳转到登录页面 点击 Create Account 进行一个新建账户 注册完成后会有一封邮件发送一个一次性密码,到时候验证一下就行。 登录完成之后 点击你的头像 点击Account 进行验证…

ORA-01940 处理方法

问题描述 在删除用户时,提示 ORA-01940:无法删除当前连接的用户 处理方法 出现这种错误,是因为当前用户有连接的session。 1.通过如下语句查询对应的连接: select sid,serial# from v$session where usernameTSAI结果如下&am…

BACnet资料整理

BACnet stack 链接: link VS2019工程有几个编译错误,文件没有加入工程中 https://bacnet.sourceforge.net/ 使用该协议栈生成的几个工具 https://sourceforge.net/projects/bacnet/files/bacnet-tools/ BACnet stack BACnet基础 https://wenku.baidu.com/view/bd…

用OpenCV进行传统图像分割

1. 引言 欢迎回来,我的图像处理爱好者们!本文我们将直接进入传统图像分析的新领域——图像分割,这是指将图像分成若干具有相似性质的区域的过程,从数学角度来看,图像分割是将图像划分成互不相交的区域的过程。 闲话少…

上海亚商投顾:沪指高开高走涨1.31% 汽车整车板块领涨

上海亚商投顾前言:无惧大盘涨跌,解密龙虎榜资金,跟踪一线游资和机构资金动向,识别短期热点和强势个股。 市场情绪 三大指数今日集体反弹,沪指全天高开高走,深成指、创业板指午后有所回落。中字头及以保险为…

3.FreeRTOS系统源码移植

目录 一、获取FreeRTOS源代码 二、FreeRTOS系统源码内容 三、FreeRTOS系统源码移植 一、获取FreeRTOS源代码 来FreeRTOS官方网站:https://www.freertos.org/ 我这里主要提供的是例程为FreeRTOS的V10.4.6版本 1、进入官网,点击Download FreeRTOS 2、点击Downl…

数分面试题-SQL高频考点

目录标题 1、SQL语言分类2、join连接3、列转换3.1 列转行3.2 行转列 4、分页查询5、字符串处理函数5.1 字符函数5.2 数学函数5.3 日期函数 6、索引6.1 什么是索引6.2 建立索引的优缺点6.3 索引有哪些6.4 索引为什么快6.5 什么情况下加索引6.6 怎么知道索引用没用上6.7 用过组合…

Axure教程—中继器删除与批量删除

本文介绍的是用Axure中的中继器实现删除与批量删除效果 效果 功能 1、选中某项数据删除,删除后提示“删除成功” 2、选择多项数据删除,删除后提示“删除成功”,如果不选取数据,点击”批量删除“按钮,提示”请至少选择…

SNMP 计算机网络管理 一文理清-管理信息库,OID,MIB结构树,SNMP协议体系结构

⬜⬜⬜ 🐰🟧🟨🟩🟦🟪(*^▽^*)欢迎光临 🟧🟨🟩🟦🟪🐰⬜⬜⬜ ✏️write in front✏️ 📝个人主页:陈丹宇jmu &am…

Java接口详解

目录 接口方法 接口的属性 接口方法 在Java设计的时候, 我们所说的接口,不同于类,我们尝尝希望一个类能满足某个特定的功能,或者需求. 我们在使用Arrays类中的sort方法对对象数组进行排序,但是对象所属的类必须实现Comparable接口: 可以看到里面只有一个方法: public inter…

【动态规划算法】第二题:⾯试题08.01.三步问题

💖作者:小树苗渴望变成参天大树 🎉作者宣言:认真写好每一篇博客 🎊作者gitee:gitee 如 果 你 喜 欢 作 者 的 文 章 ,就 给 作 者 点 点 关 注 吧! 文章目录 前言 前言 今天我们开始讲解动态规…

MySQL数据库总结 之 约束(restraint) 外键约束

前三篇关于MySQL的博客,地址如下: 1. MySQL数据库 && SQL语言命令总结 && 数据类型、运算符和聚合函数汇总_Flying Bulldog的博客-CSDN博客 2. 从0到1 && 关于MySQL的数据库和表_Flying Bulldog的博客-CSDN博客 3. MySQL数据…

Protobuf实现序列化和反序列化详细步骤

步骤1&#xff1a;添加对应的依赖 <dependency><groupId>com.google.protobuf</groupId><artifactId>protobuf-java</artifactId><version>3.7.1</version> </dependency>步骤2&#xff1a;编写bulid.bat文件执行&#xff0c…

react组件--npm发包总过程(超全教程!建议收藏!)

npm发包总过程 npm账号登录注册&#xff08;已有账号可跳过&#xff09;登录验证是否登录成功创建组件项目目录图新建文件夹&#xff0c;并初始化安装依赖/src/index.js --打包入口文件src/components/button/index.js --组件逻辑代码src/components/button/index.css --组件逻…