MYSQL PARTITIONING分区操作和性能测试

news2025/1/19 8:19:19

PARTITION OR NOT PARTITION IN MYSQl

Bill Karwin says “In most circumstances, you’re better off using indexes instead of partitioning as your main method of query optimization.”
According to RICK JAMES: “It is so tempting to believe that PARTITIONing will solve performance problems. But it is so often wrong.”
let’s find out what’s going on by building a test case

TWO TABLES READY

How many partitions? views from Rick James: Have 20-50 partitions; no more.
In this page, we do 10 partitions
Remember: Always test your real case.

  1. Partition table with 10 partitions
CREATE TABLE points_partition 
(id INT NOT NULL AUTO_INCREMENT,
 x FLOAT,
 y FLOAT,
 z FLOAT,
 created_time DATETIME,
 PRIMARY KEY(id, created_time))
PARTITION BY RANGE( YEAR(created_time) ) (
	PARTITION p16 VALUES less than (2016),
	PARTITION p17 VALUES less than (2017),
    PARTITION p18 VALUES less than (2018),
	PARTITION p19 VALUES less than (2019),
    PARTITION p20 VALUES less than (2020),
	PARTITION p21 VALUES less than (2021),
    PARTITION p22 VALUES less than (2022),
	PARTITION p23 VALUES less than (2023),
    PARTITION p24 VALUES less than (2024),
	PARTITION p25 VALUES less than (2025)
) ;

  1. Normal table
CREATE TABLE points_full_table 
(id INT NOT NULL AUTO_INCREMENT,
 x FLOAT,
 y FLOAT,
 z FLOAT,
 created_time DATETIME,
 PRIMARY KEY(id, created_time));

Create millions of rows

For test case, each table holds 10 millions of rows
If using mysql to insert, example 2 is better than example 1

-- sql example 1
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
-- sql example 2
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2"),
                                                 ("data1", "data2"),
                                                 ("data1", "data2");

Add large data with tools

from faker import Faker
import random

def insert_large_data(nums=10):
    fake = Faker()

    data = [(random.random(), random.random(), random.random(),
             str(fake.date_time_between(start_date='-10y', end_date='now'))) for i in range(nums)]
             
    cursor = connection.cursor()

    sql = f"INSERT INTO points_partition (x, y, z, created_time) VALUES (%s, %s, %s, %s)"

    # execute sql with your idea tool

DB-status

partition table take extra files to preserve data, also, extra disk space
请添加图片描述
partition table
请添加图片描述

TEST RESULTS WITHOUT EXTRA INDEX(created_time)

test-1
select SQL_NO_CACHE * from sample.points_partition where created_time > '2024-01-01' limit 100;

FROM: explain

idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_partitionp25ALL91162533.33Using where
idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_full_tableALL974720733.33Using where

FROM:mysqlslap

# partition_table
Benchmark
	Running for engine innodb
	Average number of seconds to run all queries: 0.156 seconds
	Minimum number of seconds to run all queries: 0.156 seconds
	Maximum number of seconds to run all queries: 0.156 seconds
	Number of clients running queries: 10
	Average number of queries per client: 10
# full_table
Benchmark
	Running for engine innodb
	Average number of seconds to run all queries: 0.172 seconds
	Minimum number of seconds to run all queries: 0.172 seconds
	Maximum number of seconds to run all queries: 0.172 seconds
	Number of clients running queries: 10
	Average number of queries per client: 10

In general, it is expected that fewer touched rows would result in less time for query execution.
since this query only required limit rows under condition without order, mysql optimizer is doing a good job here.
the worse case for the full table is that do a full table scan, but to get just 100 target rows from random data, much less time is needed.

however, if we put a order by in where clause, things will be a huge different.

test-2
select SQL_NO_CACHE * from sample.points_partition where created_time > '2024-01-01' order by created_time limit 100;

FROM: explain

idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_partitionp25ALL91162533.33Using where; Using filesort
idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_full_tableALL974720733.33Using where; Using filesort

FROM:mysqlslap

# partition table
Benchmark
	Running for engine innodb
	Average number of seconds to run all queries: 4.931 seconds
	Minimum number of seconds to run all queries: 4.931 seconds
	Maximum number of seconds to run all queries: 4.931 seconds
	Number of clients running queries: 10
	Average number of queries per client: 10
# full table
Benchmark
	Running for engine innodb
	Average number of seconds to run all queries: 54.652 seconds
	Minimum number of seconds to run all queries: 54.652 seconds
	Maximum number of seconds to run all queries: 54.652 seconds
	Number of clients running queries: 10
	Average number of queries per client: 10

A huge time gap between two queries.
what’ going on?
under condition of “order by”
a full table needs a full table-field sort, that’s cost a lot,
a partition table only need to sort a partition after located target partition.
we always say: test your real case, by this way, you find your circumstance to do a partition table.

WHY:In most circumstances, you’re better off using indexes instead of partitioning

the test are not done yet
From mysql explain, the extra field print a message: “Using filesort”
normally, you should considering a index here to improve performance: MYSQL: explain-extra-information

let’s add a index

ALTER TABLE `points_partition` ADD INDEX `created_time_index` (`created_time`);
ALTER TABLE `points_full_table` ADD INDEX `created_time_index` (`created_time`);

TEST RESULTS WITH INDEX

test-3
select SQL_NO_CACHE * from sample.points_partition where created_time > '2024-01-01' limit 100;

FROM: explain

idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_partitionp25rangecreated_time_indexcreated_time_index5455812100.00Using index condition
idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_full_tablerangecreated_time_indexcreated_time_index52641784100.00Using index condition; Using MRR

FROM: mysqlslap

# partition table
Benchmark
	Running for engine innodb
	Average number of seconds to run all queries: 0.168 seconds
	Minimum number of seconds to run all queries: 0.168 seconds
	Maximum number of seconds to run all queries: 0.168 seconds
	Number of clients running queries: 10
	Average number of queries per client: 10
# full table
Benchmark
	Running for engine innodb
	Average number of seconds to run all queries: 0.368 seconds
	Minimum number of seconds to run all queries: 0.368 seconds
	Maximum number of seconds to run all queries: 0.368 seconds
	Number of clients running queries: 10
	Average number of queries per client: 10

again: In general, it is expected that fewer touched rows would result in less time for query execution.
new queries cost a little more time than without extra index.
what happens? explain shows “condition index” are being used here.
stop here, it’s not how indexes are introduced.
sometimes, index is not help if the goal was retrieve 100 target rows. the worst case, yes, but not all.

let’s put a “order by” to see the magic

test-4
select SQL_NO_CACHE * from sample.points_partition where created_time > '2024-01-01' order by created_time limit 100;

FROM: explain

idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_partitionp25rangecreated_time_indexcreated_time_index5455812100.00Using index condition
idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_full_tablerangecreated_time_indexcreated_time_index52641784100.00Using index condition

FROM: mysqlslap

# partition table
Benchmark
	Running for engine innodb
	Average number of seconds to run all queries: 0.162 seconds
	Minimum number of seconds to run all queries: 0.162 seconds
	Maximum number of seconds to run all queries: 0.162 seconds
	Number of clients running queries: 10
	Average number of queries per client: 10
# full table
Benchmark
	Running for engine innodb
	Average number of seconds to run all queries: 0.185 seconds
	Minimum number of seconds to run all queries: 0.185 seconds
	Maximum number of seconds to run all queries: 0.185 seconds
	Number of clients running queries: 10
	Average number of queries per client: 10

same touched rows as no “order by”.
but the time cost of queries are getting really closed.
makes sense “In this circumstance, you’re better off using indexes instead of partitioning”.
after all, there are different types of queries were influenced and Maintenance of PARTITION is also a big thing.
For example: select count() is much slower for partition tables. unless doing a partition count()

more tests?
let’s stop here

table vs (better view)

key/typepartitionnormalpartition+ordernormal+orderpartition+indexnormal+indexpartition+order+indexnormal+order+index
diskspace~590m~540m~590m~540m~750m~700m~750m~700m
mysqlslap-benchmark0.156s0.172s4.931s54.652s0.168s0.368s0.162s0.185s
mysql-explain-touched-rows9116259747207911625974720745581226417844558122641784
index////created_time_indexcreated_time_indexcreated_time_indexcreated_time_index

POINTS BASED ON TEST(mysqlslap & mysql workbench)

  1. Index works good without partitioning, most of cases even better
  2. Under condition of range query by partition field, partitioning tables works good indeed
  3. drop partitions is much more efficient when doing a big delete
  4. if queries use specific partition, performance will better

Other Points Related & documents & Links:

  1. Partitioning mainly helps when your full table is larger than RAM
  2. No partitioning without million rows, Only BY RANGE provides any performance…
  3. index order(DESC or ASC) is also important
  4. mysqlslap–benchmark tool
  5. questions about partition

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2256053.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

[软件工程]九.可依赖系统(Dependable Systems)

9.1什么是系统的可靠性(reliability) 系统的可靠性反映了用户对系统的信任程度。它反映了用户对其能够按照预期运行且正常使用中不会失效的信心程度。 9.2什么是可依赖性(dependablity)的目的 其目的是覆盖系统的可用性&#x…

vue3中使用watchEffect和watch函数时应当防止内存泄漏

官方文档:https://cn.vuejs.org/api/reactivity-core.html#watcheffect 也就是说当使用他们两个时候,使用完成之后要及时停止他们,防止一直在运行,停止他们之后,也可以再次开启。 watchEffect()​ 立即运行一个函数…

Wwise SoundBanks内存优化

1.更换音频格式为Vorbis 2.停用多余的音频,如Random Container的随机脚步声数量降为2个 3.背景音乐勾选“Stream”。这样就让音频从硬盘流送到Wwise,而不是保存在内存当中,也就节省了内存 4.设置最大发声数Max Voice Instances 5.设置音频…

Windows宝塔面板下IIS环境如何部署SSL证书?

Windows宝塔面板下IIS环境如何部署SSL证书? 平时服务器linux宝塔用的较多,所以linux系统宝塔,如何部署SSL证书还是比较熟悉,今天遇到一个windows的部署SSL证书,还是头一次,所以记录一下,以防忘…

【计算机视觉】图像的几何变换

最常见的几何变换有仿射变换和单应性变换两种,最常用的仿射变换有缩放、翻转、旋转、平移。 1. 缩放 将图像放大或缩小会得到新的图像,但是多出的像素点如何实现----插值 1.1 插值方法 最近邻插值 双线性插值 cv2.resize() 是 OpenCV 中用于调整图像…

深入浅出 Go 语言:数组与切片

深入浅出 Go 语言:数组与切片 引言 在 Go 语言中,数组和切片是两种非常重要的数据结构,用于存储和操作一组相同类型的元素。虽然它们看起来相似,但在使用上有很大的区别。理解数组和切片的区别以及如何正确使用它们,…

基于超级电容和电池的新能源汽车能量管理系统simulink建模与仿真

目录 1.课题概述 2.系统仿真结果 3.核心程序与模型 4.系统原理简介 4.1 超级电容特性 4.2 电池特性 5.完整工程文件 1.课题概述 基于超级电容和电池的新能源汽车能量管理系统simulink建模与仿真。分析不同车速对应的电池,超级电容充放电变化情况。 2.系统仿…

y3编辑器文档3:物体编辑器

文章目录 一、物体编辑器简介1.1 界面介绍1.2 复用(导入导出)1.3 收藏夹(项目资源管理)1.4 对象池二、单位2.1 数据设置2.2 表现设置2.3 单位势力和掉率设置2.4 技能添加和技能参数修改2.5 商店2.5.1 商店属性设置2.5.2 商店物品设置三、装饰物3.1 属性编辑3.2 碰撞体积四、…

「嵌入式系统设计与实现」书评:学习一个STM32的案例

本文最早发表于电子发烧友论坛:【新提醒】【「嵌入式系统设计与实现」阅读体验】 学习一个STM32的案例 - 发烧友官方/活动 - 电子技术论坛 - 广受欢迎的专业电子论坛!https://bbs.elecfans.com/jishu_2467617_1_1.html 感谢电子发烧友论坛和电子工业出版社的赠书。 …

Qt Designer Ui设计 功能增加

效果展示 输入密码,密码错误,弹出提示 密码正确,弹出提示并且关闭原窗口 代码(只提供重要关键主代码)lxh_log.py代码: import sysfrom PySide6.QtWidgets import QApplication, QWidget, QPushButtonfrom …

RT Thread Studio新建STM32F407IG工程文件编译提示错误

编译提示错误 原因: RT 源码使用4.0.3的话,请用STM32F4支持包的0.2.2版本,就不会出错了。 如果支持包用0.2.3版本的话,需要用RT内核4.1.0版本。0.2.3 版本更新了一些针对内核4.1.0的驱动代码,这几个定义都是4.1.0里的。

智能制造标准体系建设指南

一、智能制造系统架构总览 智能制造作为当今制造业转型升级的核心,深度整合了新一代信息技术与传统制造工艺,催生出一个横跨产品全生命周期、纵贯多层级组织架构,并彰显多元智能特性的复杂系统。这一架构从生命周期、系统层级、智能特征三个…

DApp开发与APP开发的五大区别

随着比特币与区块链技术的不断发展,DApp应用会逐渐成为主流。与APPAPP相比,DApp有许多不同之处,尤其是在架构、数据存储、用户隐私等方面。本文将通过五大关键点,深入探讨DApp开发与APP开发之间的主要区别。 1. 后端架构&#xff…

XSS(DOM)-HIGH错误总结

HIGH就不从简单的开始。 我们直接闭合HTML标签绕过 ></option></select><img srcx:alert(alt) οnerrοreval(src) altxss> 没有变化 这里应该是后端的问题&#xff0c;试试锚点注入 English#<script>alert(xss)</script> 这里不知道什么…

Mitel MiCollab 企业协作平台 任意文件读取漏洞复现(CVE-2024-41713)

0x01 产品简介 Mitel MiCollab是加拿大Mitel(敏迪)公司推出的一款企业级协作平台,旨在为企业提供统一、高效、安全的通信与协作解决方案。通过该平台,员工可以在任何时间、任何地点,使用任何设备,实现即时通信、语音通话、视频会议、文件共享等功能,从而提升工作效率和…

【PostgreSQL系列】列类型从整数转换为 UUID

&#x1f49d;&#x1f49d;&#x1f49d;欢迎来到我的博客&#xff0c;很高兴能够在这里和您见面&#xff01;希望您在这里可以感受到一份轻松愉快的氛围&#xff0c;不仅可以获得有趣的内容和知识&#xff0c;也可以畅所欲言、分享您的想法和见解。 推荐:kwan 的首页,持续学…

【原生js案例】webApp实现鼠标移入移出相册放大缩小动画

图片相册这种动画效果也很常见&#xff0c;在我们的网站上。鼠标滑入放大图片&#xff0c;滑出就恢复原来的大小。现在我们使用运动定时器来实现这种滑动效果。 感兴趣的可以关注下我的系列课程【webApp之h5端实战】&#xff0c;里面有大量的css3动画效果制作原生知识分析&…

Java项目实战II基于微信小程序的消防隐患在线举报系统(开发文档+数据库+源码)

目录 一、前言 二、技术介绍 三、系统实现 四、核心代码 五、源码获取 全栈码农以及毕业设计实战开发&#xff0c;CSDN平台Java领域新星创作者&#xff0c;专注于大学生项目实战开发、讲解和毕业答疑辅导。获取源码联系方式请查看文末 一、前言 随着城市化进程的加快&…

饲料颗粒机全套设备有哪些机器组成

颗粒饲料机主要用于将各种饲料原料&#xff08;如玉米、豆粕、麦麸、鱼粉等&#xff09;进行混合、压制&#xff0c;制成颗粒状的饲料。这种饲料不仅方便储存和运输&#xff0c;还能提高动物的采食效率和饲料利用率。同时&#xff0c;颗粒饲料在加工过程中能灭部分微生物和寄生…