基本信息:
线上一个库5.7.25库经常出现大量慢查询,在再次出现时登陆数据库进行分析,通过show engine innodb status 内容,发线程全部在等一个锁,这个锁极可能来源于这个truncate table动作:
---TRANSACTION 4764270793, ACTIVE 12 sec truncating table
mysql tables in use 1, locked 1
0 lock struct(s), heap size 1136, 0 row lock(s)
MySQL thread id 41, OS thread handle 140094989018880, query id 4942655207 System lock
Alter table xxxxx truncate partition p20210812
这个动作进行了12秒,仔细检查等锁的100来个线程,没有超过12秒的,几乎可以认定都是由这个truncate 动作引发的等待。
进一步分析:
分析正在等锁的线程,因为都一样,以其中一个线程为例展开分析:
--Thread 140093981456128 has waited at btr0sea.ic line 128 for 12.00 seconds the semaphore:
S-lock on RW-latch at 0x285e848 created in file btr0sea.cc line 230
a writer (thread id 140094867109632) has reserved it in mode wait exclusive
number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
Last time read locked in file btr0sea.ic line 128
Last time write locked in file /home/service/percona-server-5.7.25-28/storage/innobase/include/btr0sea.ic line 90
由于 show engine innodb status 内容,信息较少,结合现场抓到的堆栈信息分析,将上面Thread 140093981456128 转换成16进制,到得 7F6A2BFFF700,从打印的堆栈找到对应线程的堆栈如下:
Thread 227 (Thread 0x7f6a2bfff700 (LWP 480928)):
#0 0x00007f7b7e3dc965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x000000000107cbab in wait (this=<optimized out>) at /home/service/percona-server-5.7.25-28/storage/innobase/include/os0event.h:156
#2 wait_low (reset_sig_count=<optimized out>, this=0x285e880) at /home/service/percona-server-5.7.25-28/storage/innobase/os/os0event.cc:131
#3 os_event_wait_low (event=0x285e880, reset_sig_count=<optimized out>) at /home/service/percona-server-5.7.25-28/storage/innobase/os/os0event.cc:328
#4 0x000000000113af0b in sync_array_wait_event (arr=arr@entry=0x2407ef8, cell=@0x7f6a2bff88f0: 0x7f7b71aa7028) at /home/service/percona-server-5.7.25-28/storage/innobase/sync/sync0arr.cc:475
#5 0x000000000113ca1c in rw_lock_s_lock_spin (lock=lock@entry=0x285e848, pass=pass@entry=0, file_name=file_name@entry=0x15bd9f8 "/home/service/percona-server-5.7.25-28/storage/innobase/include/btr0sea.ic", line=line@entry=128) at /home/service/percona-server-5.7.25-28/storage/innobase/sync/sync0rw.cc:419
#6 0x00000000011b8c3e in rw_lock_s_lock_func (pass=0, line=128, file_name=0x15bd9f8 "/home/service/percona-server-5.7.25-28/storage/innobase/include/btr0sea.ic", lock=0x285e848) at /home/service/percona-server-5.7.25-28/storage/innobase/include/sync0rw.ic:433
#7 pfs_rw_lock_s_lock_func (pass=0, line=128, file_name=0x15bd9f8 "/home/service/percona-server-5.7.25-28/storage/innobase/include/btr0sea.ic", lock=0x285e848) at /home/service/percona-server-5.7.25-28/storage/innobase/include/sync0rw.ic:796
#8 btr_search_s_lock (index=0x7f64c9de9a08) at /home/service/percona-server-5.7.25-28/storage/innobase/include/btr0sea.ic:128
#9 btr_search_build_page_hash_index (index=0x7f64c9de9a08, block=block@entry=0x7f7734a5df40, n_fields=2, n_bytes=0, left_side=0) at /home/service/percona-server-5.7.25-28/storage/innobase/btr/btr0sea.cc:1519
#10 0x00000000011b9a62 in btr_search_info_update_slow (info=<optimized out>, cursor=cursor@entry=0x7f6a407669a8) at /home/service/percona-server-5.7.25-28/storage/innobase/btr/btr0sea.cc:826
#11 0x00000000011aad36 in btr_search_info_update (cursor=0x7f6a407669a8, index=0x7f64c9de9a08) at /home/service/percona-server-5.7.25-28/storage/innobase/include/btr0sea.ic:81
#12 btr_cur_search_to_nth_level (index=index@entry=0x7f64c9de9a08, level=level@entry=0, tuple=0x7f6a40767a10, mode=mode@entry=PAGE_CUR_LE, latch_mode=latch_mode@entry=1, cursor=cursor@entry=0x7f6a407669a8, has_search_latch=has_search_latch@entry=0, file=file@entry=0x15ae570 "/home/service/percona-server-5.7.25-28/storage/innobase/row/row0sel.cc", line=line@entry=3546, mtr=mtr@entry=0x7f6a2bffccf0) at /home/service/percona-server-5.7.25-28/storage/innobase/btr/btr0cur.cc:2015
#13 0x00000000010f9ccb in btr_pcur_open_with_no_init_func (latch_mode=1, file=0x15ae570 "/home/service/percona-server-5.7.25-28/storage/innobase/row/row0sel.cc", mtr=0x7f6a2bffccf0, line=3546, has_search_latch=0, cursor=0x7f6a407669a8, mode=PAGE_CUR_LE, tuple=<optimized out>, index=0x7f64c9de9a08) at /home/service/percona-server-5.7.25-28/storage/innobase/include/btr0pcur.ic:535
#14 row_sel_get_clust_rec_for_mysql (prebuilt=prebuilt@entry=0x7f6a40766678, sec_index=sec_index@entry=0x7f64c9a73648, rec=rec@entry=0x7f70cc62c58a "1000113461", thr=thr@entry=0x7f6a40983748, out_rec=out_rec@entry=0x7f6a2bffc0f0, offsets=offsets@entry=0x7f6a2bffc120, offset_heap=offset_heap@entry=0x7f6a2bffc110, vrow=0x0, mtr=mtr@entry=0x7f6a2bffccf0) at /home/service/percona-server-5.7.25-28/storage/innobase/row/row0sel.cc:3546
#15 0x0000000001102e10 in row_search_mvcc (buf=buf@entry=0x7f6a407e10e0 "\220\340\377\377\307\177\204\300\a\374\245\226院", mode=<optimized out>, mode@entry=PAGE_CUR_UNSUPP, prebuilt=0x7f6a40766678, match_mode=match_mode@entry=0, direction=direction@entry=1) at /home/service/percona-server-5.7.25-28/storage/innobase/row/row0sel.cc:5966
#16 0x0000000000feb3ab in ha_innobase::general_fetch (this=this@entry=0x7f6a403a42c0, buf=buf@entry=0x7f6a407e10e0 "\220\340\377\377\307\177\204\300\a\374\245\226院", direction=direction@entry=1, match_mode=match_mode@entry=0) at /home/service/percona-server-5.7.25-28/storage/innobase/handler/ha_innodb.cc:10237
#17 0x0000000000feb70d in ha_innobase::index_next (this=this@entry=0x7f6a403a42c0, buf=buf@entry=0x7f6a407e10e0 "\220\340\377\377\307\177\204\300\a\374\245\226院") at /home/service/percona-server-5.7.25-28/storage/innobase/handler/ha_innodb.cc:10312
#18 0x0000000001003e5d in ha_innopart::read_range_next_in_part (this=0x7f6a403a42c0, part=29, record=0x0) at /home/service/percona-server-5.7.25-28/storage/innobase/handler/ha_innopart.cc:2347
#19 0x0000000000bb4b58 in Partition_helper::handle_unordered_next (this=0x7f6a403a47c0, buf=0x7f6a407e10e0 "\220\340\377\377\307\177\204\300\a\374\245\226院", is_next_same=<optimized out>) at /home/service/percona-server-5.7.25-28/sql/partitioning/partition_handler.cc:3076
#20 0x00000000007b3882 in handler::multi_range_read_next (this=0x7f6a403a42c0, range_info=0x7f6a2bffd2e0) at /home/service/percona-server-5.7.25-28/sql/handler.cc:6859
#21 0x0000000000dd94eb in QUICK_RANGE_SELECT::get_next (this=0x7f67c9235680) at /home/service/percona-server-5.7.25-28/sql/opt_range.cc:11248
#22 0x0000000000bb684a in rr_quick (info=0x7f67c98796d0) at /home/service/percona-server-5.7.25-28/sql/records.cc:399
#23 0x0000000000c25d97 in sub_select (join=0x7f67c8f85fe8, qep_tab=0x7f67c9879680, end_of_records=<optimized out>) at /home/service/percona-server-5.7.25-28/sql/sql_executor.cc:1280
#24 0x0000000000c1ec57 in do_select (join=0x7f67c8f85fe8) at /home/service/percona-server-5.7.25-28/sql/sql_executor.cc:950
#25 JOIN::exec (this=this@entry=0x7f67c8f85fe8) at /home/service/percona-server-5.7.25-28/sql/sql_executor.cc:199
#26 0x0000000000c1aa24 in TABLE_LIST::materialize_derived (this=this@entry=0x7f67c801bf30, thd=thd@entry=0x7f67c8a98300) at /home/service/percona-server-5.7.25-28/sql/sql_derived.cc:326
#27 0x0000000000c1f67f in join_materialize_derived (tab=<optimized out>) at /home/service/percona-server-5.7.25-28/sql/sql_executor.cc:2506
#28 0x0000000000c1f172 in QEP_TAB::prepare_scan (this=0x7f67c8a7cd80) at /home/service/percona-server-5.7.25-28/sql/sql_executor.cc:1331
#29 0x0000000000c25c90 in sub_select (join=0x7f67c8f85c00, qep_tab=0x7f67c8a7cd80, end_of_records=<optimized out>) at /home/service/percona-server-5.7.25-28/sql/sql_executor.cc:1231
#30 0x0000000000c1ec57 in do_select (join=0x7f67c8f85c00) at /home/service/percona-server-5.7.25-28/sql/sql_executor.cc:950
#31 JOIN::exec (this=0x7f67c8f85c00) at /home/service/percona-server-5.7.25-28/sql/sql_executor.cc:199
#32 0x0000000000c8d61d in handle_query (thd=thd@entry=0x7f67c8a98300, lex=lex@entry=0x7f67c8a9a740, result=result@entry=0x7f67c801de08, added_options=added_options@entry=0, removed_options=removed_options@entry=0) at /home/service/percona-server-5.7.25-28/sql/sql_select.cc:185
#33 0x0000000000774714 in execute_sqlcom_select (thd=thd@entry=0x7f67c8a98300, all_tables=<optimized out>) at /home/service/percona-server-5.7.25-28/sql/sql_parse.cc:5467
#34 0x0000000000c5145d in mysql_execute_command (thd=thd@entry=0x7f67c8a98300, first_level=first_level@entry=true) at /home/service/percona-server-5.7.25-28/sql/sql_parse.cc:2994
#35 0x0000000000c545f5 in mysql_parse (thd=thd@entry=0x7f67c8a98300, parser_state=parser_state@entry=0x7f6a2bffe6f0, update_userstat=update_userstat@entry=false) at /home/service/percona-server-5.7.25-28/sql/sql_parse.cc:5901
#36 0x0000000000c55196 in dispatch_command (thd=thd@entry=0x7f67c8a98300, com_data=com_data@entry=0x7f6a2bffedc0, command=COM_QUERY) at /home/service/percona-server-5.7.25-28/sql/sql_parse.cc:1528
#37 0x0000000000c56b37 in do_command (thd=thd@entry=0x7f67c8a98300) at /home/service/percona-server-5.7.25-28/sql/sql_parse.cc:1053
#38 0x0000000000d12348 in handle_connection (arg=arg@entry=0x2b4adb0) at /home/service/percona-server-5.7.25-28/sql/conn_handler/connection_handler_per_thread.cc:318
#39 0x0000000000ef80b1 in pfs_spawn_thread (arg=0x297d550) at /home/service/percona-server-5.7.25-28/storage/perfschema/pfs.cc:2190
#40 0x00007f7b7e3d8dd5 in start_thread () from /lib64/libpthread.so.0
#41 0x00007f7b7d09e02d in clone () from /lib64/libc.so.6
这个堆栈非常长,是在做一个比较复杂的查询,挑选靠近等锁部分函数分析:row_search_mvcc -> row_sel_get_clust_rec_for_mysql ->btr_pcur_open_with_no_init_func->btr_cur_search_to_nth_level->btr_search_info_update->btr_search_info_update_slow->btr_search_build_page_hash_index->btr_search_s_lock->pfs_rw_lock_s_lock_func。
前面是常规的调用路径,从 btr_cur_search_to_nth_level 函数开始分析,当打开 AHI 的时候,会调用btr_search_info_update 来更新相应的统计信息,如果当前的索引的统计到被扫描次数达到17次,则要调用 btr_search_info_update_slow 更新 search info & block info 信息,也就是说要将经常使用的索引加到AHI中,此时需要对AHI的加锁,根因就是这把锁加不上,都在等这个锁:
如上代码,在btr_search_build_page_hash_index 中对index加锁,这个index就是AHI的锁,所有线程都是卡这等锁,但等的不一定是同一把锁,因为AHI 根据 innodb_adaptive_hash_index_parts 参数拆分成了多把,但等锁的根因在此,一直加不上这个锁就一直在等,导致慢查询。
为什么这个锁长时间拿不到?
再来分析最可疑的truncate动作在干什么,同样根据Thread 140093981456128 转换的指针 0x7f6a680e2700找到对应堆栈如下:
Thread 306 (Thread 0x7f6a680e2700 (LWP 1923)):
#0 0x00000000011f88f7 in buf_LRU_drop_page_hash_for_tablespace (id=<optimized out>, buf_pool=<optimized out>) at /home/service/percona-server-5.7.25-28/storage/innobase/buf/buf0lru.cc:287
#1 buf_LRU_flush_or_remove_pages (id=id@entry=141593, buf_remove=buf_remove@entry=BUF_REMOVE_ALL_NO_WRITE, trx=trx@entry=0x0) at /home/service/percona-server-5.7.25-28/storage/innobase/buf/buf0lru.cc:994
#2 0x000000000124b3fc in fil_reinit_space_header_for_table (table=table@entry=0x7f6920aecf48, size=7, trx=trx@entry=0x7f7b716a71d0) at /home/service/percona-server-5.7.25-28/storage/innobase/fil/fil0fil.cc:3275
#3 0x000000000110aef5 in row_truncate_table_for_mysql (table=<optimized out>, trx=0x7f7b716a71d0) at /home/service/percona-server-5.7.25-28/storage/innobase/row/row0trunc.cc:2090
#4 0x0000000000fff2a4 in ha_innopart::truncate (this=0x7f686f1d5420) at /home/service/percona-server-5.7.25-28/storage/innobase/handler/ha_innopart.cc:3211
#5 0x0000000000e2939d in truncate_partition (this=0x7f686f1d59f8) at /home/service/percona-server-5.7.25-28/sql/partitioning/partition_handler.h:278
#6 Sql_cmd_alter_table_truncate_partition::execute (this=<optimized out>, thd=0x7f6a3c0008c0) at /home/service/percona-server-5.7.25-28/sql/sql_partition_admin.cc:787
#7 0x0000000000c4e4e2 in mysql_execute_command (thd=thd@entry=0x7f6a3c0008c0, first_level=first_level@entry=true) at /home/service/percona-server-5.7.25-28/sql/sql_parse.cc:5132
#8 0x0000000000c545f5 in mysql_parse (thd=0x7f6a3c0008c0, parser_state=parser_state@entry=0x7f6a680e1c20, update_userstat=update_userstat@entry=true) at /home/service/percona-server-5.7.25-28/sql/sql_parse.cc:5901
#9 0x0000000000e5331f in Query_log_event::do_apply_event (this=0x7f6a480f7450, rli=0x7f6a48023da0, query_arg=0x7f6a4842a342 "Alter table rpt_ads_base_busi_game_app_perf_m truncate partition p20210812", q_len_arg=<optimized out>) at /home/service/percona-server-5.7.25-28/sql/log_event.cc:4916
#10 0x0000000000eb531f in slave_worker_exec_job_group (worker=worker@entry=0x7f6a48023da0, rli=rli@entry=0x29f21b0) at /home/service/percona-server-5.7.25-28/sql/rpl_rli_pdb.cc:2743
#11 0x0000000000e986b3 in handle_slave_worker (arg=arg@entry=0x7f6a48023da0) at /home/service/percona-server-5.7.25-28/sql/rpl_slave.cc:6270
#12 0x0000000000ef80b1 in pfs_spawn_thread (arg=0x7f6a48027c40) at /home/service/percona-server-5.7.25-28/storage/perfschema/pfs.cc:2190
#13 0x00007f7b7e3d8dd5 in start_thread () from /lib64/libpthread.so.0
#14 0x00007f7b7d09e02d in clone () from /lib64/libc.so.6
显然这正是在做truncate的动作,这里的做buf_LRU_flush_or_remove_pages 时,会调用到 buf_LRU_drop_page_hash_for_tablespace 函数,从堆栈上可以看出最底层的函数buf_LRU_drop_page_hash_for_tablespace 并没有等任何锁,应该是在做大任务,分析该函数代码:
它要对buffer_pool instance加锁,然后遍历该Buffer pool的每个页面,和当前被Truncate的表所有页面比较,然后在该buffer pool中清理掉当前表的页面,期间如果对该buffer pool instance的页面申请操作都会等,一直等truncate table完成。并且这个操作不是只操作一个buffer pool instance,所有的都要,也就是说要把整个buffer pool内存都扫一遍:
假如要清理的表占用1万个页面(约160MB),buffer pool总有占有50万个脏页,双重循环运算量非常大。虽然说全内存很快,但这个量太大了,导致加锁时间会比较长,虽然buffer pool的已经拆分降低等锁时间,但这里还是容易成为瓶颈点,这是强制全部检查清理的设计逻辑似乎可以优化。虽然说情理上看,锁住buf pool其它线程就无法再向该buf pool申请页面,拒绝AHI操作勉强说的通,但AHI那边等锁就一定有人拿着它锁,继续向上层分析。看上层函数 fil_reinit_space_header_for_table :
在做 buf_LRU_flush_or_remove_pages 清理前,会对所有的AHI 加锁,不管拆分了几个,大表truncate时几乎逃不过卡的命运了,只要开了AHI,经常命中的数据就可能要被挂AHI上,那么就需要加这个锁,而这个锁在truncate过程中是不会释放的,那么就可能会导致所有的查询都等在这。
小结:
truncate table动作要对所有AHI加锁,禁止更新AHI,等被truncate的表在全部buf pool中清理完自己的页面再放锁。查询线程因为打开了AHI,在查询过程中会将高频命中的索引加到AHI中去,提高查询效率,该过程要对index对应的AHI加锁,但该锁被truncate动作持有,要等着truncate完成,导致等在这,并且这里truncate是要持有所有AHI的锁,完全禁止AHI更新,所以truncate A 表时查B表一样可能会被卡住。
优化建议:
1. 将这个分区的truncate改成drop+create
当 innodb_file_per_table = on 时,也就是每个表都有独立ibd文件时,truncate就等于drop + create,下面是官方文档说明:
官方在对drop做了lazy drop优化,不需要立马清理buf pool,持锁时间短, 理论会更好。但要是并发删除多个大表可能会带来其它性能问题,如果仅是单线程drop大表可以尝试。
2. 删除动作前禁用AHI功能,删除后再打开。
AHI是一个锦上添花的功能,只会对已经在Cache中多次命中的索引建立自适应哈希索引,也就是说内存中已经有了,优化点在于走常规索引搜索链路可能3~4层,走AHI索引检索一两层就能找到。这些都是内存操作,临时关掉影响应该不大,可以做尝试。但这个设置对从库不生效,row模式Binlog中不会记录这个AHI参数的变化。
3. 拆分成多条Delete。
如果没有时效性要求,可以将一个完整的truncate拆分成多个Delete慢慢执行,好处是主从都不会有大负荷,delete不需要强制清buf pool页面,坏处是会产生很多binlog(row模式)。truncate表只有一条binlog,DELETE就纯看表数据,如果有表1000万行就会产生1000万条删除的binlog,执行时间可能是直接truncate的几十倍甚至几百倍都可能,取决于表的数据量。
4. 升级MySQL
从上面分析可以看出来,这个问题明显是官方设计上的不够完美导致,搜了一下发现官方也是一直知道这个问题的,终于在5.7.34和8.0.23对此做了优化,所以要升级版本后能解决该问题。
好事的后记
慢的根本原因就是truncate时对所有的AHI缓存instance加了锁,导致其它要用AHI的内存线程加不上这个锁,所以等在这。一定也有人很好奇,官方是怎么修复的?
官方的修复方式让人惊掉下巴!!!
一共改了两行代码,把 fil_reinit_space_header_for_table 函数中对所有的AHI加锁的逻辑去掉了,并且是直接去掉,没有其它任何改动。官方说经过仔细研究,这里的加锁是多此一举,前面的逻辑已经可以确保不要这个锁不会有问题的。。。
这么多年,因此而引发的多少线上故障,皆付笑谈中。。。
patch SHA: f849f27e01f5e414bcaa995a3ba460782a1f8766 ,有兴趣的可以自己去看看。