最近生产上每个星期都会有几次死锁告警异常,今天终于给处理了,待后续观察,记录下整个过程。
环境:springboot、mybatis、mysql(RC隔离级别)
表结构:
CREATE TABLE `table1` (
`id` bigint NOT NULL AUTO_INCREMENT,
`prize_key` varchar(128) ,
'case_id'
`in_case_index` int,
PRIMARY KEY (`id`),
KEY `idx_case_id` (`case_id`)
);
阿里云日志告警:
### Cause: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
; Deadlock found when trying to get lock; try restarting transaction; nested exception is com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
立刻查找生产mysql死锁日志:
show engine innodb status;
------------------------
LATEST DETECTED DEADLOCK
------------------------
2023-06-26 14:19:21 0x7fb2ddf141ff700
*** (1) TRANSACTION:
TRANSACTION 2221516621, ACTIVE 0 sec fetching rows
mysql tables in use 2, locked 2
LOCK WAIT 676 lock struct(s), heap size 73936, 6770 row lock(s), undo log entries 10
MySQL thread id 14996730, OS thread handle 14040d64831044096, query id 462343171910 10.1.4.1 lottery executing
insert into X
( prize_key,
prize_type,
*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 666 page no 864646 n bits 112 index PRIMARY of table `xx`.`table1` trx id 2221516621 lock mode S locks rec but not gap
Record lock, heap no 8 PHYSICAL RECORD: n_fields 33; compact format; info bits 128
0: len 8; hex 80000000d001bdb6d7; asc ;;
1: len 6; hex 00008468599e; asc hY ;;
2: len 7; hex 01000003571172; asc W r;;
3: len 30; hex 64643332343238332d313032652d1346335652d623131342d633537316663; asc dd324283-102e-4c5e-b114-c571fc; (total 36 bytes);
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 666 page no 864646 n bits 112 index PRIMARY of table `xx`.`table1` trx id 2221516621 lock mode S locks rec but not gap waiting
Record lock, heap no 42 PHYSICAL RECORD: n_fields 33; compact format; info bits 128
0: len 8; hex 8000000000bdb6f3; asc ;;
1: len 6; hex 00008469a753; asc i S;;
2: len 7; hex 020000096b30c4; asc k0 ;;
3: len 30; hex 36356265633039662d633136662d3432323112d623063352d353933626362; asc 65bec09f-c16f-4221-b0c5-593bcb; (total 36 bytes);
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 666 page no 864646 n bits 112 index PRIMARY of table `xx`.`table1` trx id 2221516627 lock_mode X locks rec but not gap
Record lock, heap no 2 PHYSICAL RECORD: n_fields 33; compact format; info bits 128
0: len 8; hex 8000000000bdb6d1; asc ;;
1: len 6; hex 00008469a753; asc i S;;
2: len 7; hex 020000096b162a; asc k *;;
3: len 30; hex 3038343738306d6322d323362362d3464611652d616333612d313639626136; asc 084780f2-23b6-4dae-ac3a-169ba6; (total 36 bytes);
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 666 page no 864646 n bits 112 index PRIMARY of table `xx`.`table1` trx id 2221516627 lock_mode X locks rec but not gap waiting
Record lock, heap no 33 PHYSICAL RECORD: n_fields 33; compact format; info bits 128
0: len 8; hex 8000000000bdb6f0; asc ;;
1: len 6; hex 000084685661; asc hVa;;
2: len 7; hex 81000001350143; asc 5 C;;
3: len 30; hex 66643332656336622d363536662d343761392d6166139342d333563653634; asc fd32ec6b-656f-47a9-af94-35ce64; (total 36 bytes);
*** WE ROLL BACK TRANSACTION (2)
根据事物1和事物2的sql,以及java报错的堆栈信息,找到发生问题的代码。
大致如下:
// 事物1
for () {
xxmaper.insert();
}
//事物2
xxmaper.update(list);
大概看了下,俩个事物不会存在插入和修改的数据冲突。
只能查看mysql审计日志了,通过阿里云后台,用MySQL thread id,分别查到 俩个事物所有运行的sql。大致如下:
--事物1:
-- 10个insert
insert into table1
( prize_key,prize_type, `case_id`,`in_case_index` )
values ( 'cf2c9336-4bcc-44f8-b57a-e9a020c9e2f3','Z', 11133, (SELECT a.ci+1 from (SELECT IFNULL(MAX( `in_case_index`),0) as ci FROM `table1` WHERE `case_id` = 111) as a) );
-- 事物2
-- 100多个update
update table1 where prize_key in ('','');
看到最终的sql,有点眉目了,再次查看mysql死锁日志。得到如下结果:
事物1执行到sql,
insert into table1
( prize_key,prize_type, `in_case_index` )
values ( '1cf2c9336-4bcc-44f8-b57a-e9a020c9e2f3','Z',
(SELECT a.ci+1 from (SELECT IFNULL(MAX( `in_case_index`),0) as ci FROM `table1` WHERE `case_id` = 68033) as a) )
拿到s锁:
HOLDS THE LOCK(S): 666 page no 864646 n bits 112 index PRIMARY of table `xx`.`table1` trx id 2221516621 lock mode S locks rec but not gap
具体s锁记录:386fd81cc-4689-4f20-89d1-e98967,fd32ec6b-656f-47a9-af94-35ce64,
等待s锁:RECORD LOCKS space id 666 page no 864646 n bits 112 index PRIMARY of table `xx`.`table1` trx id 2221516621 lock mode S locks rec but not gap waiting
具体等待记录:65bec09f-c16f-4221-b0c5-593bcb
事物2:
拿到x锁: HOLDS THE LOCK(S): RECORD LOCKS space id 666 page no 864646 n bits 112 index PRIMARY of table `xx`.`table1` trx id 2221516627 lock_mode X locks rec but not gap
具体x锁记录:084780f2-23b6-4dae-ac3a-169ba6,,65fee89c-2d22-4abd-a8d1-0d9357
等待X锁:RECORD LOCKS space id 666 page no 864646 n bits 112 index PRIMARY of table `xx`.`table1` trx id 2221516627 lock_mode X locks rec but not gap waiting
具体等待记录记录:fd32ec6b-656f-47a9-af94-35ce64,
说白了,就是:
事物1拿到了S锁,等待X锁; 这个X锁被事物2拿到了;
事物2拿到了X锁,等待S锁;这个S锁被事物1拿到了;
分析已经很到位了,下面就是本地复现。
将相关sql复制到本地,开两个窗口,关闭自动提交。确实复现了。
最终解决方案:
因为事物1拿到的是s锁,且是主键锁,所以最简单办法就是 select里面,避免去回表。
SELECT a.ci+1 from
(SELECT IFNULL(MAX( `in_case_index`),0) as ci FROM `table1` WHERE `case_id` = 1111) as a
执行计划看到的是走的索引idx_case_id。
于是给这个索引加了一个组合,idx_case_id(case_id,in_case_index);
再次走 执行计划,确实没有回表了。
至此mysql死锁问题解决,后面就是静静的等待结果了,没有结果就是最好的结果。