详解 InnoDB Cluster 主机名问题
文章目录
- 详解 InnoDB Cluster 主机名问题
- 导言
- 测试过程
- 结论
导言
因在写 【InnoDB Cluster】修改已有集群实例名称及成员实例选项 时发现主机名这块有一些问题,在其中进行了部分测试,但为使其内容精简,故将此部分单独拿出来形成一篇文章。要了解更多信息,请读者参阅该篇文章。
测试过程
使用cluster.setInstanceOption('旧成员实例名称','label','新成员实例名称')
来修改成员实例名称。
MySQL ic-source:33060+ ssl JS > cluster.setInstanceOption('replica3:3306','label','replica2.ic')
Setting the value of 'label' to 'replica2.ic' in the instance: 'replica3:3306' ...
Successfully set the value of 'label' to 'replica2.ic' in the cluster member: 'replica3:3306'.
cluster.status()
结果:
MySQL ic-source:33060+ ssl JS > cluster.status()
{
"clusterName": "myCluster",
"defaultReplicaSet": {
"name": "default",
"primary": "ic-source:3306",
"ssl": "REQUIRED",
"status": "OK",
"statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
"topology": {
"ic-replica1:3306": {
"address": "ic-replica1:3306",
"memberRole": "SECONDARY",
"mode": "R/O",
"readReplicas": {},
"replicationLag": "applier_queue_applied",
"role": "HA",
"status": "ONLINE",
"version": "8.0.31"
},
"ic-source:3306": {
"address": "ic-source:3306",
"memberRole": "PRIMARY",
"mode": "R/W",
"readReplicas": {},
"replicationLag": "applier_queue_applied",
"role": "HA",
"status": "ONLINE",
"version": "8.0.31"
},
"replica2.ic": {
"address": "replica3:3306",
"memberRole": "SECONDARY",
"mode": "R/O",
"readReplicas": {},
"replicationLag": "applier_queue_applied",
"role": "HA",
"status": "ONLINE",
"version": "8.0.31"
}
},
"topologyMode": "Single-Primary"
},
"groupInformationSourceMember": "ic-source:3306"
}
cluster.options()
结果:
MySQL ic-source:33060+ ssl JS > cluster.options()
...
"tags": {
"global": [],
"ic-replica1:3306": [],
"ic-source:3306": [],
"replica2.ic": []
},
...
"replica2.ic": [
注意,这里有个问题,我故意拿 replica3:3306 作为cluster.setInstanceOption()
方法的实例名称参数,却依然可以运行,证明旧的实例名称并没有完全失效。那是主机缓存的问题吗?
我们来FLUSH HOSTS;
试试。
mysql> flush hosts;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> show warnings;
+---------+------+--------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+---------+------+--------------------------------------------------------------------------------------------------------------------------------------+
| Warning | 1287 | 'FLUSH HOSTS' is deprecated and will be removed in a future release. Please use TRUNCATE TABLE performance_schema.host_cache instead |
+---------+------+--------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> show warnings;
+---------+------+--------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+---------+------+--------------------------------------------------------------------------------------------------------------------------------------+
| Warning | 1287 | 'FLUSH HOSTS' is deprecated and will be removed in a future release. Please use TRUNCATE TABLE performance_schema.host_cache instead |
+---------+------+--------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> show tables from performance_schema like 'host%';
+--------------------------------------+
| Tables_in_performance_schema (host%) |
+--------------------------------------+
| host_cache |
| hosts |
+--------------------------------------+
2 rows in set (0.04 sec)
mysql> select * from performance_schema.host_cache;
Empty set (0.06 sec)
再次设置label
选项:
MySQL ic-source:33060+ ssl JS > cluster.setInstanceOption('replica3:3306','label','replica2')
Setting the value of 'label' to 'replica2' in the instance: 'replica3:3306' ...
Successfully set the value of 'label' to 'replica2' in the cluster member: 'replica3:3306'.
MySQL ic-source:33060+ ssl JS > cluster.status()
{
"clusterName": "myCluster",
"defaultReplicaSet": {
"name": "default",
"primary": "ic-source:3306",
"ssl": "REQUIRED",
"status": "OK",
"statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
"topology": {
"ic-replica1:3306": {
"address": "ic-replica1:3306",
"memberRole": "SECONDARY",
"mode": "R/O",
"readReplicas": {},
"replicationLag": "applier_queue_applied",
"role": "HA",
"status": "ONLINE",
"version": "8.0.31"
},
"ic-source:3306": {
"address": "ic-source:3306",
"memberRole": "PRIMARY",
"mode": "R/W",
"readReplicas": {},
"replicationLag": "applier_queue_applied",
"role": "HA",
"status": "ONLINE",
"version": "8.0.31"
},
"replica2": {
"address": "replica3:3306",
"memberRole": "SECONDARY",
"mode": "R/O",
"readReplicas": {},
"replicationLag": "applier_queue_applied",
"role": "HA",
"status": "ONLINE",
"version": "8.0.31"
}
},
"topologyMode": "Single-Primary"
},
"groupInformationSourceMember": "ic-source:3306"
}
还是可以运行。那再截断performance_schema.hosts
表呢?
mysql> select * from performance_schema.hosts;
+-------------+---------------------+-------------------+-------------------------------+--------------------------+
| HOST | CURRENT_CONNECTIONS | TOTAL_CONNECTIONS | MAX_SESSION_CONTROLLED_MEMORY | MAX_SESSION_TOTAL_MEMORY |
+-------------+---------------------+-------------------+-------------------------------+--------------------------+
| NULL | 39 | 86 | 272128 | 66816672 |
| localhost | 2 | 5 | 1477504 | 1985033 |
| ic-source | 0 | 45 | 2466600 | 2817255 |
| ic-replica1 | 0 | 2 | 8240 | 74084 |
| ic-replica2 | 0 | 6 | 20576 | 126813 |
| replica2 | 1 | 38 | 20576 | 74084 |
| source | 15 | 159 | 2466360 | 2860667 |
| replica1 | 1 | 3 | 20576 | 74084 |
+-------------+---------------------+-------------------+-------------------------------+--------------------------+
8 rows in set (0.00 sec)
mysql> truncate table performance_schema.hosts;
Query OK, 0 rows affected (0.09 sec)
mysql> select * from performance_schema.hosts;
+-----------+---------------------+-------------------+-------------------------------+--------------------------+
| HOST | CURRENT_CONNECTIONS | TOTAL_CONNECTIONS | MAX_SESSION_CONTROLLED_MEMORY | MAX_SESSION_TOTAL_MEMORY |
+-----------+---------------------+-------------------+-------------------------------+--------------------------+
| NULL | 39 | 39 | 108512 | 1370284 |
| localhost | 2 | 2 | 647040 | 671012 |
| replica2 | 1 | 1 | 8240 | 74084 |
| source | 15 | 15 | 1841072 | 2397944 |
| replica1 | 1 | 1 | 8240 | 74084 |
+-----------+---------------------+-------------------+-------------------------------+--------------------------+
5 rows in set (0.00 sec)
再次设置label
选项:
MySQL ic-source:33060+ ssl JS > cluster.setInstanceOption('replica3:3306','label','replica2.ic')
Setting the value of 'label' to 'replica2.ic' in the instance: 'replica3:3306' ...
Successfully set the value of 'label' to 'replica2.ic' in the cluster member: 'replica3:3306'.
还是可以运行。
MySQL ic-source:33060+ ssl JS > cluster.status()
...
"replica2.ic": {
"address": "replica3:3306",
"memberRole": "SECONDARY",
"mode": "R/O",
"readReplicas": {},
"replicationLag": "applier_queue_applied",
"role": "HA",
"status": "ONLINE",
"version": "8.0.31"
}
所以估计还是其他地方有缓存,或者和addresses
可以识别出旧的实例名称。继而,我修改addresses
中的mysqlClassic
值为 r2-table:3306,即普通 MySQL 客户端使用的经典(标准) MySQL 协议的通讯地址。
mysql> update instances set addresses=json_replace(addresses,'$.mysqlClassic','r2-table:3306') where instance_id=4;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select * from instances where instance_id=4;
+-------------+--------------------------------------+---------------+--------------------------------------+---------------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| instance_id | cluster_id | address | mysql_server_uuid | instance_name | addresses | attributes | description |
+-------------+--------------------------------------+---------------+--------------------------------------+---------------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| 4 | f465b0e3-6ce3-11ed-8310-000c298d6cb9 | replica3:3306 | 2737b324-a0f6-11ea-afd1-000c2988ff33 | replica2.ic | {"mysqlX": "replica3:33060", "grLocal": "r2-table:3306", "mysqlClassic": "r2-table:3306"} | {"joinTime": "2022-11-26 01:12:34.768", "server_id": 1250867127, "recoveryAccountHost": "%", "recoveryAccountUser": "mysql_innodb_cluster_1250867127"} | NULL |
+-------------+--------------------------------------+---------------+--------------------------------------+---------------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
1 row in set (0.00 sec)
然后尝试修改label
,发现报错,
MySQL ic-source:33060+ ssl JS > cluster.setInstanceOption('replica3:3306','label','replica2')
Cluster.setInstanceOption: The instance 'replica3:3306' does not belong to the cluster. (RuntimeError)
MySQL ic-source:33060+ ssl JS > cluster.status()
{
"clusterName": "myCluster",
"defaultReplicaSet": {
"name": "default",
"primary": "ic-source:3306",
"ssl": "REQUIRED",
"status": "OK",
"statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
"topology": {
"ic-replica1:3306": {
"address": "ic-replica1:3306",
"memberRole": "SECONDARY",
"mode": "R/O",
"readReplicas": {},
"replicationLag": "applier_queue_applied",
"role": "HA",
"status": "ONLINE",
"version": "8.0.31"
},
"ic-source:3306": {
"address": "ic-source:3306",
"memberRole": "PRIMARY",
"mode": "R/W",
"readReplicas": {},
"replicationLag": "applier_queue_applied",
"role": "HA",
"status": "ONLINE",
"version": "8.0.31"
},
"replica2.ic": {
"address": "r2-table:3306",
"instanceErrors": [
"ERROR: Metadata for this instance does not match hostname reported by instance (metadata=r2-table:3306, actual=replica3:3306). Use rescan() to update the metadata."
],
"memberRole": "SECONDARY",
"mode": "R/O",
"readReplicas": {},
"replicationLag": "applier_queue_applied",
"role": "HA",
"status": "ONLINE",
"version": "8.0.31"
}
},
"topologyMode": "Single-Primary"
},
"groupInformationSourceMember": "ic-source:3306"
}
这里我们暂时先不rescan()
,尝试使用其他成员实例名称(当前成员实例名称 replica2.ic,当前成员实例的 group_replication_local_address 变量值 r2-cnf:3306,和当前元数据表中新设的值 r2-table:3306 ):
MySQL ic-source:33060+ ssl JS > cluster.setInstanceOption('replica2.ic','label','replica2')
Cluster.setInstanceOption: The instance 'replica2.ic:3306' does not belong to the cluster. (RuntimeError)
MySQL ic-source:33060+ ssl JS > cluster.setInstanceOption('r2-cnf:3306','label','replica2')
Cluster.setInstanceOption: The instance 'r2-cnf:3306' does not belong to the cluster. (RuntimeError)
MySQL ic-source:33060+ ssl JS > cluster.setInstanceOption('r2-table:3306','label','replica2')
Cluster.setInstanceOption: The instance 'r2-table:3306' does not belong to the cluster. (RuntimeError)
再试下mysqlX
的值 replica3:33060 ,显然协议都不对:
MySQL ic-source:33060+ ssl JS > cluster.setInstanceOption('replica3:33060','label','replica2')
ERROR: Unable to connect to the target instance 'replica3:33060'. Please verify the connection settings, make sure the instance is available and try again.
Cluster.setInstanceOption: Could not open connection to 'replica3:33060': The provided URI uses the X protocol, which is not supported by this command. (RuntimeError)
都报错,证明改这个东西导致 MYSQL Shell 元数据与实际情况出现了不匹配,虽然集群仍保持高可用,成员实例仍保持在线状态,但 MYSQL Shell API 已经无法对该成员实例进行修改了。我验证了此时也仍可建立新客户端会话。由此证明mysqlClassic
很重要!
那么我们改回正确的值 replica3:3306 :
mysql> update instances set addresses=json_replace(addresses,'$.mysqlClassic','replica3:3306') where instance_id=4;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
查看集群状态显示已经恢复了。把所有/etc/hosts
中该 IP 地址对应的名称全都拿来作为成员实例名称传入该方法的第一个参数中:
MySQL ic-source:33060+ ssl JS > cluster.setInstanceOption('r2-table:3306','label','replica2')
Setting the value of 'label' to 'replica2' in the instance: 'r2-table:3306' ...
Successfully set the value of 'label' to 'replica2' in the cluster member: 'r2-table:3306'.
MySQL ic-source:33060+ ssl JS > cluster.setInstanceOption('replica3:3306','label','replica2.ic')
Setting the value of 'label' to 'replica2.ic' in the instance: 'replica3:3306' ...
Successfully set the value of 'label' to 'replica2.ic' in the cluster member: 'replica3:3306'.
MySQL ic-source:33060+ ssl JS > cluster.setInstanceOption('r2-cnf:3306','label','replica2')
Setting the value of 'label' to 'replica2' in the instance: 'r2-cnf:3306' ...
MySQL ic-source:33060+ ssl JS > cluster.setInstanceOption('replica2.ic:3306','label','replica2.ic')
Setting the value of 'label' to 'replica2.ic' in the instance: 'replica2.ic:3306' ...
Successfully set the value of 'label' to 'replica2.ic' in the cluster member: 'replica2.ic:3306'.
MySQL ic-source:33060+ ssl JS > cluster.setInstanceOption('ic-replica2:3306','label','replica2')
Setting the value of 'label' to 'replica2' in the instance: 'ic-replica2:3306' ...
Successfully set the value of 'label' to 'replica2' in the cluster member: 'ic-replica2:3306'.
都可以运行,证明并不是只识别旧的实例名称,而是/etc/hosts
中为该 IP 地址定义的所有名称!
那么为什么如此呢?前面我们已经分析得出结论,需要重启 MySQL 服务器(进程)才会将修改的主机名应用到数据库。所以这个实例名称 MySQL Shell API 是通过 MySQL 的主机名(SELECT @@hostname;
)和协议使用的端口号拼接而成的。因为 MySQL 8.0.27 版本以后的 MySQL 8.0 组复制通信堆栈 支持 MySQL 通信协议 作为可选项,且 InnoDB Cluster 默认采用 MySQL 通信堆栈。所以此处的端口号是 MYSQL 默认的 3306 ,而非 XCOM 默认的 33061 。
mysql> SELECT @@group_replication_communication_stack;
+-----------------------------------------+
| @@group_replication_communication_stack |
+-----------------------------------------+
| MYSQL |
+-----------------------------------------+
1 row in set (0.00 sec)
如果我们刚刚使用rescan()
呢?再现刚刚的问题:
mysql> update instances set addresses=json_replace(addresses,'$.mysqlClassic','r2-table:3306') where instance_id=4;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
}
MySQL ic-source:33060+ ssl JS > cluster.rescan()
Rescanning the cluster...
Result of the rescanning operation for the 'myCluster' cluster:
{
"name": "myCluster",
"newTopologyMode": null,
"newlyDiscoveredInstances": [],
"unavailableInstances": [],
"updatedInstances": [
{
"host": "replica3:3306",
"id": 4,
"label": "replica2",
"member_id": "2737b324-a0f6-11ea-afd1-000c2988ff33",
"old_host": "ic-replica2:3306"
}
]
}
The instance 'replica3:3306' is part of the cluster but its reported address has changed. Old address: ic-replica2:3306. Current address: replica3:3306.
Updating instance metadata...
The instance metadata for 'replica3:3306' was successfully updated.
MySQL ic-source:33060+ ssl JS > cluster.status()
{
"clusterName": "myCluster",
"defaultReplicaSet": {
"name": "default",
"primary": "ic-source:3306",
"ssl": "REQUIRED",
"status": "OK",
"statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
"topology": {
"ic-replica1:3306": {
"address": "ic-replica1:3306",
"memberRole": "SECONDARY",
"mode": "R/O",
"readReplicas": {},
"replicationLag": "applier_queue_applied",
"role": "HA",
"status": "ONLINE",
"version": "8.0.31"
},
"ic-source:3306": {
"address": "ic-source:3306",
"memberRole": "PRIMARY",
"mode": "R/W",
"readReplicas": {},
"replicationLag": "applier_queue_applied",
"role": "HA",
"status": "ONLINE",
"version": "8.0.31"
},
"replica2": {
"address": "replica3:3306",
"memberRole": "SECONDARY",
"mode": "R/O",
"readReplicas": {},
"replicationLag": "applier_queue_applied",
"role": "HA",
"status": "ONLINE",
"version": "8.0.31"
}
},
"topologyMode": "Single-Primary"
},
"groupInformationSourceMember": "ic-source:3306"
}
再次印证了我刚刚的猜想。
结论
主机名这块,最关键的是 MySQL 的主机名(SELECT @@hostname;
),而非操作系统当前的主机名。 当然,只要 MySQL 服务器(进程)一重启,就会读取操作系统当前的主机名作为 MySQL 的主机名(SELECT @@hostname;
)。