慢SQL优化笔记：从3041ms到10ms的优化之旅

1 背景

最近项目上要求开发一个查询接口，刚开发完成后，测试环境进行自测发现查询效率非常慢，通过打log计算得到SQL执行时间3041ms，实在太慢了。下面简单记录下本次慢SQL优化的过程。

2 相关数据

新增客户表（其它无关字段省略）

create table new_customer_info(
	id bigint(32) not null auto_increment comment '唯一id' ,
	ent_name varchar(255) default null comment '企业名称',
	area_name varchar(50) defalut null comment '区域',
	register_date datetime default null comment '注册时间',
	phone_number varchar(50) defalut null comment '手机号',
	del_flag char(1) default '0' comment '删除标志，0：正常，1：删除',
	sms_flag char(1) defalut '0' comment '获取手机号标志，0：已获取，1：未获取',
	will_status char(1) default null comment '合作意向，0：无，1：未联系上，2：有合作意向',
	...
	primary key (id)
);

区域表（其它无关字段省略）

create table area_data(
	area_code varchar(50) default null comment '地区代码',
	area_name varchar(100) default null comment '地区名称',
	parent_name varchar(100) default null comment '上级地区名称',
	...
	key idx_area_code (area_code)
);

字典数据表（其它无关字段省略）

create table sys_dict_data(
	dict_label varchar(100) default '' comment '字典标签',
	dict_value varchar(100) default '' comment '字典键值',
	dict_type varchar(100) default '' comment '字典类型',
	...
	key idx_dict_type (dict_type) using btree,
	key idx_dict_value (dict_value) using btree
)

开发的查询接口功能：
该接口返回new_customer_info表中所有已经电话回访后的企业信息列表。

3 原始SQL

SELECT 
	nci.net_name entName,
	nci.area_name areaName,
	ad.parent_name cityName,
	add.dict_label industry,
	date_format(register_date,'%Y-%m-%d') registerDate,
	nci.phone_number phoneNumber,
	nci.will_status willStatus
FROM new_customer_info nci 
LEFT JOIN area_data ad ON ad.area_code = nci.area_name 
LEFT JOIN sys_dict_data sdd ON (nci.industry = sdd.dict_value AND ad.dict_type ='train_industry')
WHERE nci.del_flag = '0' AND nci.sms_flag = '0';

对应的xml语句：

<select id='getNewCustomerInfo' resultType='com.javacoder.domain.NewCustomerVo'>

	select 
		nci.net_name entName,
		nci.area_name areaName,
		ad.parent_name cityName,
		add.dict_label industry,
		date_format(register_date,'%Y-%m-%d') registerDate,
		nci.phone_number phoneNumber,
		nci.will_status willStatus
	FROM new_customer_info nci 
	LEFT JOIN area_data ad ON ad.area_code = nci.area_name 
	LEFT JOIN sys_dict_data sdd ON (nci.industry = sdd.dict_value AND ad.dict_type ='train_industry')
	WHERE nci.del_flag = '0' AND nci.sms_flag = '0'
	<if test="areaName != null and areaName != ''">
		and nci.area_name like concat('%',concat(#{entName}),'%'))
	</if>
	<if test="entName != null and entName != ''">
		and nci.ent_name like concat('%',concat(#{entName}),'%'))
	</if>
	<if test="willStatus != null and willStatus != ''">
		and nci.will_status like concat('%',concat(#{willStatus}),'%'))
	</if>
</select>

4 优化思路

分析SQL执行计划：使用EXPLAIN命令查看SQL的执行计划，分析查询过程中涉及的表、索引、连接方式等信息，找出性能瓶颈
优化索引：

检查相关表的索引情况，确保new_customer_info表的del_flag、sms_flag、area_name、industry字段，area_data表的area_code、dict_type字段，sys_dict_data表的dict_value字段都建立了合适的索引。

考虑创建组合索引，例如在new_customer_info表上创建(del_flag, sms_flag, area_name, industry)组合索引，以提高查询效率
优化连接方式：

将LEFT JOIN改为INNER JOIN，如果area_data和sys_dict_data表中没有匹配的数据，则不需要返回new_customer_info表中的数据。

调整连接顺序，将过滤条件较多的表放在前面连接，以减少中间结果集的大小
优化查询条件：

将WHERE条件中的常量条件提前，例如将ad.dict_type ='train_industry’放在LEFT JOIN条件中。

使用EXISTS或IN子查询代替JOIN操作，如果子查询结果集较小
优化代码逻辑：
通过优化代码执行逻辑，来提高SQL执行效率。

5 优化过程

5.1 分析SQL执行计划

explain SELECT 
	nci.net_name entName,
	nci.area_name areaName,
	ad.parent_name cityName,
	add.dict_label industry,
	date_format(register_date,'%Y-%m-%d') registerDate,
	nci.phone_number phoneNumber,
	nci.will_status willStatus
FROM new_customer_info nci 
LEFT JOIN area_data ad ON ad.area_code = nci.area_name 
LEFT JOIN sys_dict_data sdd ON (nci.industry = sdd.dict_value AND ad.dict_type ='train_industry')
WHERE nci.del_flag = '0' AND nci.sms_flag = '0';

explain 输出结果中各个字段的含义:

id 列：查询的标识符。
select_type 列：查询的类型。常见的类型有：

SIMPLE：简单查询，不包含子查询或者 UNION 查询。
PRIMARY：查询中如果包含子查询，则最外层查询被标记为 PRIMARY。
SUBQUERY：子查询。
DERIVED：派生表的 SELECT，FROM 子句的子查询。
table 列：查的哪个表。
type 列：表示 MySQL 在表中找到所需行的方式，性能从最优到最差分别为：system > const > eq_ref > ref > range > index > ALL。

system，表只有一行，一般是系统表，往往不需要进行磁盘 IO，速度非常快
const、eq_ref、ref：这些类型表示 MySQL 可以使用索引来查找单个行，其中 const 是最优的，表示查询最多返回一行。
range：只检索给定范围的行，使用索引来检索。在where语句中使用 bettween…and、<、>、<=、in 等条件查询 type 都是 range。
index：遍历索引树读取。
ALL：全表扫描，效率最低。
possible_keys 列：可能会用到的索引，但并不一定实际被使用。
key 列：实际使用的索引。如果为 NULL，则没有使用索引。
key_len 列：MySQL 决定使用的索引长度（以字节为单位）。当表有多个索引可用时，key_len 字段可以帮助识别哪个索引最有效。通常情况下，更短的 key_len 意味着数据库在比较键值时需要处理更少的数据。
ref 列：用于与索引列比较的值来源。

const：表示常量，这个值是在查询中被固定的。例如在 WHERE column = 'value’中。
一个或多个列的名称，通常在 JOIN 操作中，表示 JOIN 条件依赖的字段。
NULL，表示没有使用索引，或者查询使用的是全表扫描。
rows 列：估算查到结果集需要扫描的数据行数，原则上 rows 越少越好。
Extra 列：附加信息。

Using index：表示只利用了索引。
Using where：表示使用了 WHERE 过滤。
Using temporary ：表示使用了临时表来存储中间结果。

5.2 优化索引

在new_customer_info表上创建组合索引

CREATE INDEX idx_nci_del_sms_area_industry ON new_customer_info (del_flag, sms_flag, area_name, industry);

在area_data表上创建组合索引

CREATE INDEX idx_ad_area_code_dict_type ON area_data (area_code, dict_type);

在在sys_dict_data表上创建组合索引

CREATE INDEX idx_sdd_dict_value_dict_type ON sys_dict_data (dict_value,dict_type);

优化结果：SQL执行耗时从3041ms降低到2780ms，效果并不明显，需继续优化。

5.3 优化连接方式

将LEFT JOIN改为INNER JOIN

SELECT 
	nci.net_name entName,
	nci.area_name areaName,
	ad.parent_name cityName,
	add.dict_label industry,
	date_format(register_date,'%Y-%m-%d') registerDate,
	nci.phone_number phoneNumber,
	nci.will_status willStatus
FROM new_customer_info nci 
inner JOIN area_data ad ON ad.area_code = nci.area_name 
inner JOIN sys_dict_data sdd ON (nci.industry = sdd.dict_value AND ad.dict_type ='train_industry')
WHERE nci.del_flag = '0' AND nci.sms_flag = '0';

优化结果：不可行，实际可能会存在new_customer_info表中area_name与area_data表中的area_code对应不上的情况，如果用INNER JOIN，会导致查询出的数据不完整。

5.4 优化代码逻辑

优化前代码逻辑：先查询出全部的new_customer_info表数据，然后用Stream流过滤掉willStatus为null的数据，最后返回。在这个过程中，由于new_costomer_info表的数据量很大，area_data表和sys_dict_data表的数据量较小，通过LEFT JOIN连接三表查询效率很慢。
优化后代码逻辑：先将过滤willStatus为空的逻辑放到SQL语句中，减少连接的数据量。最后无需再通过Stream流过滤数据，直接返回查询结果即可。
优化后的SQL

SELECT 
	nci.net_name entName,
	nci.area_name areaName,
	ad.parent_name cityName,
	add.dict_label industry,
	date_format(register_date,'%Y-%m-%d') registerDate,
	nci.phone_number phoneNumber,
	nci.will_status willStatus
FROM new_customer_info nci 
inner JOIN area_data ad ON ad.area_code = nci.area_name 
inner JOIN sys_dict_data sdd ON (nci.industry = sdd.dict_value AND ad.dict_type ='train_industry')
WHERE nci.del_flag = '0' AND nci.sms_flag = '0' and nci.will_status is not null;

对应的xml语句

<select id='getNewCustomerInfo' resultType='com.javacoder.domain.NewCustomerVo'>

	select 
		nci.net_name entName,
		nci.area_name areaName,
		ad.parent_name cityName,
		add.dict_label industry,
		date_format(register_date,'%Y-%m-%d') registerDate,
		nci.phone_number phoneNumber,
		nci.will_status willStatus
	FROM new_customer_info nci 
	LEFT JOIN area_data ad ON ad.area_code = nci.area_name 
	LEFT JOIN sys_dict_data sdd ON (nci.industry = sdd.dict_value AND ad.dict_type ='train_industry')
	WHERE nci.del_flag = '0' AND nci.sms_flag = '0'
	<if test="areaName != null and areaName != ''">
		and nci.area_name like concat('%',concat(#{entName}),'%'))
	</if>
	<if test="entName != null and entName != ''">
		and nci.ent_name like concat('%',concat(#{entName}),'%'))
	</if>
	<if test="willStatus != null and willStatus != ''">
		and nci.will_status like concat('%',concat(#{willStatus}),'%'))
	</if>
	and nci.will_status is not null
</select>

优化结果：SQL执行耗时从2780ms降低到300ms，效果明显，但还有优化空间。

5.5 最后的优化

再次使用explain分析SQL执行计划，发现SQL执行过程中new_customer_info表输出结果的type列返回的是ALL，表示全表扫描。
优化方案：在new_customer_info表上创建索引

create index idx_will_status on new_customer_info (will_status);

优化结果：SQL执行耗时从300ms降低到10ms以内，效果显著，达到了预期目标。

6 总结

SQL优化是一个需要不断尝试和调整的过程，需要结合具体的业务场景和数据库特性进行分析和优化。通过业务逻辑代码优化、合理的索引设计、连接方式优化、查询条件优化等手段，可以有效提升SQL语句的执行效率，提高系统性能。

慢SQL优化笔记：从3041ms到10ms的优化之旅

1 背景

2 相关数据

3 原始SQL

4 优化思路

5 优化过程

5.1 分析SQL执行计划

5.2 优化索引

5.3 优化连接方式

5.4 优化代码逻辑

5.5 最后的优化

6 总结

相关文章

网络丢包定位记录(三)

robomimic应用教程（一）——模型训练

S-Clustr-Simple 飞机大战:骇入现实的建筑灯光游戏

超实用的 Typora 插件

C++入门(03)萌新问题多（二）

解决uniapp开发的app,手机预览，上下滑动页面，页面出现拉伸，抖动的效果问题，

[Leetcode] 227.基本计算器

linux 获取指定端口的PID netsat awk

vcs/verdi常用命令（持续更新）

mockito+junit搞定单元测试(2h)

HashMap扩容时机是插入前还是插入后？

如何设置 Django 错误邮件通知？

STM32F407单片机编程入门(十二) FreeRTOS实时操作系统详解及实战含源码

智慧校园建设解决方案建设系统简介

数据采集与预处理，前后端结合案例（有代码），Python连接MySQL，对MySQL的增删改查

数据结构：内部排序

软考（中级-软件设计师）计算机系统篇（0921）

AI直播新浪潮:无人视频自动直播，出圈再造辉煌，创业者首选！

MQ（RabbitMQ）笔记

2024年华为杯-研赛F题论文问题一二讲解+代码分享