在业务实践中,常见的 DDL 阻塞 原因是由于无法获取到 MDL 锁,即 Waiting for table metadata lock
。借助 metadata_lock
表,可以快速定位到 DDL 阻塞的原因。
元数据锁(MDL)
MDL加锁过程是系统自动控制,无需显式使用,在访问一张表的时候会自动加上。MDL锁主要作用是维护表元数据的数据一致性,在表上有活动事务的时候,不可以对元数据进行写入操作。主要是为了避免DML与DDL冲突,保证读写的正确性。
在MySQL5.5中引入了MDL,当对一张表进行增删改查的时候,加MDL读锁(共享),当对表结构进行变更操作的时候,加MDL写锁(排他)。
共享读与共享写都属于共享锁,他们两个互相兼容并于排他锁互斥。
查看数据库中元数据锁信息:
select object_type,object_schema,object_name,lock_type,lock_duration from performance_schema.metadata_locks ;
示例问题过程解析:
通过 show processlist
命令查询执行
alter table t1 add column d varchar(10),algorithm = inplace
后的状态。查询结果如下:
/*force_node='pi-bp10k7631d6k3****'*/ SHOW PROCESSLIST;
+-----------+-----------------+-----------------------+--------------------+----------------+---------+---------------------------------+-------------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+-----------+-----------------+-----------------------+--------------------+----------------+---------+---------------------------------+-------------------------------------------------------------+
| 98 | event_scheduler | localhost | NULL | Daemon | 1306586 | Waiting on empty queue | NULL |
| 109 | replicator | 11.111.XX.XX:62549 | NULL | Polar Log Dump | 1 | Reading log from innodb | NULL |
| 113 | replicator | 11.111.XX.XX:62560 | NULL | Polar Log Ack | 1 | Receiving from client | NULL |
| 133 | replicator | 10.13.64.70:42712 | NULL | Polar Log Dump | 1 | Reading log from innodb | NULL |
| 138 | replicator | 10.13.64.70:42723 | NULL | Polar Log Ack | 0 | Receiving from client | NULL |
| 369 | aurora | 10.111.211.209:33334 | NULL | Sleep | 0 | | NULL |
| 370 | aurora | 10.111.211.209:33336 | NULL | RDS Push LSN | 1306413 | starting | NULL |
| 372 | aurora | 10.111.204.224:37010 | NULL | Sleep | 0 | | NULL |
| 373 | aurora | 10.111.204.224:37019 | NULL | RDS Push LSN | 1306413 | starting | NULL |
| 3064011 | root | 127.0.0.1:59703 | NULL | Sleep | 716 | | NULL |
| 3064013 | root | 127.0.0.1:59710 | NULL | Sleep | 25 | | NULL |
| 3064015 | root | 127.0.0.1:59713 | NULL | Sleep | 55 | | NULL |
| 3064018 | root | 127.0.0.1:59716 | NULL | Sleep | 1 | | NULL |
| 3067041 | zyg_root | 172.17.XX.XX:48594 | test | Query | 22 | Waiting for table metadata lock | alter table t1 add column d varchar(10),algorithm = inplace |
| 3067443 | zyg_root | 172.17.XX.XX:48602 | test | Sleep | 27 | | NULL |
| 3069716 | aurora | 100.104.XX.XX:33017 | information_schema | Sleep | 30 | | NULL |
| 3069859 | aurora | 100.104.XX.XX:41872 | information_schema | Sleep | 30 | | NULL |
| 3069925 | aurora | 10.111.204.224:20916 | NULL | Sleep | 2 | | NULL |
| 3069932 | aurora | 10.111.211.209:51263 | NULL | Sleep | 2 | | NULL |
| 270526156 | zyg_root | 172.17.28.253:46272 | test | Query | 0 | starting | /*force_node='pi-bp10k7631d6k3****'*/ show processlist |
+-----------+-----------------+-----------------------+--------------------+----------------+---------+---------------------------------+-------------------------------------------------------------+
20 rows in set (0.00 sec)
可以看到线程3810574
持有了 test/t1
表的SHARED_READ
锁,导致线程3810086
无法获取到EXCLUSIVE
锁。借助performance_schema.threads
表,进一步获取对应线程的详细信息。查询结果如下:
/*force_node='pi-bp10k7631d6k3****'*/ SELECT * FROM performance_schema.threads WHERE THREAD_ID IN (3810086,3810574)\G
*************************** 1. row ***************************
THREAD_ID: 3810086
NAME: thread/sql/one_connection
TYPE: FOREGROUND
PROCESSLIST_ID: 3067041
PROCESSLIST_USER: zyg_root
PROCESSLIST_HOST: 172.17.28.253
PROCESSLIST_DB: test
PROCESSLIST_COMMAND: Query
PROCESSLIST_TIME: 41
PROCESSLIST_STATE: Waiting for table metadata lock
PROCESSLIST_INFO: alter table t1 add column d varchar(10),algorithm = inplace
PARENT_THREAD_ID: NULL
ROLE: NULL
INSTRUMENTED: YES
HISTORY: YES
CONNECTION_TYPE: TCP/IP
THREAD_OS_ID: 64852
RESOURCE_GROUP: NULL
*************************** 2. row ***************************
THREAD_ID: 3810574
NAME: thread/sql/one_connection
TYPE: FOREGROUND
PROCESSLIST_ID: 3067443
PROCESSLIST_USER: zyg_root
PROCESSLIST_HOST: 172.17.28.253
PROCESSLIST_DB: test
PROCESSLIST_COMMAND: Sleep
PROCESSLIST_TIME: 46
PROCESSLIST_STATE: NULL
PROCESSLIST_INFO: NULL
PARENT_THREAD_ID: NULL
ROLE: NULL
INSTRUMENTED: YES
HISTORY: YES
CONNECTION_TYPE: TCP/IP
THREAD_OS_ID: 65845
RESOURCE_GROUP: NULL
2 rows in set (0.01 sec)
-
可以看到线程
3810086
即为被阻塞的DDL线程,而线程3810574
则为一个慢查询。线程3810086
持有了锁,导致alter table t1 add column d varchar(10),algorithm = inplace
语句无法获取到MDL锁,因此处于阻塞状态。至此,您可以结合实际业务需求,等待事务提交或者使用KILL [processlist_id]
命令中止事务后,重新执行 -
alter table t1 add column d varchar(10),algorithm = inplace
。