PARTITION OR NOT PARTITION IN MYSQl
Bill Karwin says “In most circumstances, you’re better off using indexes instead of partitioning as your main method of query optimization.”
According to RICK JAMES: “It is so tempting to believe that PARTITIONing will solve performance problems. But it is so often wrong.”
let’s find out what’s going on by building a test case
TWO TABLES READY
How many partitions? views from Rick James: Have 20-50 partitions; no more.
In this page, we do 10 partitions
Remember: Always test your real case.
- Partition table with 10 partitions
CREATE TABLE points_partition
(id INT NOT NULL AUTO_INCREMENT,
x FLOAT,
y FLOAT,
z FLOAT,
created_time DATETIME,
PRIMARY KEY(id, created_time))
PARTITION BY RANGE( YEAR(created_time) ) (
PARTITION p16 VALUES less than (2016),
PARTITION p17 VALUES less than (2017),
PARTITION p18 VALUES less than (2018),
PARTITION p19 VALUES less than (2019),
PARTITION p20 VALUES less than (2020),
PARTITION p21 VALUES less than (2021),
PARTITION p22 VALUES less than (2022),
PARTITION p23 VALUES less than (2023),
PARTITION p24 VALUES less than (2024),
PARTITION p25 VALUES less than (2025)
) ;
- Normal table
CREATE TABLE points_full_table
(id INT NOT NULL AUTO_INCREMENT,
x FLOAT,
y FLOAT,
z FLOAT,
created_time DATETIME,
PRIMARY KEY(id, created_time));
Create millions of rows
For test case, each table holds 10 millions of rows
If using mysql to insert, example 2 is better than example 1
-- sql example 1
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
-- sql example 2
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2"),
("data1", "data2"),
("data1", "data2");
Add large data with tools
from faker import Faker
import random
def insert_large_data(nums=10):
fake = Faker()
data = [(random.random(), random.random(), random.random(),
str(fake.date_time_between(start_date='-10y', end_date='now'))) for i in range(nums)]
cursor = connection.cursor()
sql = f"INSERT INTO points_partition (x, y, z, created_time) VALUES (%s, %s, %s, %s)"
# execute sql with your idea tool
DB-status
partition table take extra files to preserve data, also, extra disk space
TEST RESULTS WITHOUT EXTRA INDEX(created_time)
test-1
select SQL_NO_CACHE * from sample.points_partition where created_time > '2024-01-01' limit 100;
FROM: explain
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | points_partition | p25 | ALL | 911625 | 33.33 | Using where |
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | points_full_table | ALL | 9747207 | 33.33 | Using where |
FROM:mysqlslap
# partition_table
Benchmark
Running for engine innodb
Average number of seconds to run all queries: 0.156 seconds
Minimum number of seconds to run all queries: 0.156 seconds
Maximum number of seconds to run all queries: 0.156 seconds
Number of clients running queries: 10
Average number of queries per client: 10
# full_table
Benchmark
Running for engine innodb
Average number of seconds to run all queries: 0.172 seconds
Minimum number of seconds to run all queries: 0.172 seconds
Maximum number of seconds to run all queries: 0.172 seconds
Number of clients running queries: 10
Average number of queries per client: 10
In general, it is expected that fewer touched rows would result in less time for query execution.
since this query only required limit rows under condition without order, mysql optimizer is doing a good job here.
the worse case for the full table is that do a full table scan, but to get just 100 target rows from random data, much less time is needed.
however, if we put a order by in where clause, things will be a huge different.
test-2
select SQL_NO_CACHE * from sample.points_partition where created_time > '2024-01-01' order by created_time limit 100;
FROM: explain
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | points_partition | p25 | ALL | 911625 | 33.33 | Using where; Using filesort |
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | points_full_table | ALL | 9747207 | 33.33 | Using where; Using filesort |
FROM:mysqlslap
# partition table
Benchmark
Running for engine innodb
Average number of seconds to run all queries: 4.931 seconds
Minimum number of seconds to run all queries: 4.931 seconds
Maximum number of seconds to run all queries: 4.931 seconds
Number of clients running queries: 10
Average number of queries per client: 10
# full table
Benchmark
Running for engine innodb
Average number of seconds to run all queries: 54.652 seconds
Minimum number of seconds to run all queries: 54.652 seconds
Maximum number of seconds to run all queries: 54.652 seconds
Number of clients running queries: 10
Average number of queries per client: 10
A huge time gap between two queries.
what’ going on?
under condition of “order by”
a full table needs a full table-field sort, that’s cost a lot,
a partition table only need to sort a partition after located target partition.
we always say: test your real case, by this way, you find your circumstance to do a partition table.
WHY:In most circumstances, you’re better off using indexes instead of partitioning
the test are not done yet
From mysql explain, the extra field print a message: “Using filesort”
normally, you should considering a index here to improve performance: MYSQL: explain-extra-information
let’s add a index
ALTER TABLE `points_partition` ADD INDEX `created_time_index` (`created_time`);
ALTER TABLE `points_full_table` ADD INDEX `created_time_index` (`created_time`);
TEST RESULTS WITH INDEX
test-3
select SQL_NO_CACHE * from sample.points_partition where created_time > '2024-01-01' limit 100;
FROM: explain
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | points_partition | p25 | range | created_time_index | created_time_index | 5 | 455812 | 100.00 | Using index condition |
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | points_full_table | range | created_time_index | created_time_index | 5 | 2641784 | 100.00 | Using index condition; Using MRR |
FROM: mysqlslap
# partition table
Benchmark
Running for engine innodb
Average number of seconds to run all queries: 0.168 seconds
Minimum number of seconds to run all queries: 0.168 seconds
Maximum number of seconds to run all queries: 0.168 seconds
Number of clients running queries: 10
Average number of queries per client: 10
# full table
Benchmark
Running for engine innodb
Average number of seconds to run all queries: 0.368 seconds
Minimum number of seconds to run all queries: 0.368 seconds
Maximum number of seconds to run all queries: 0.368 seconds
Number of clients running queries: 10
Average number of queries per client: 10
again: In general, it is expected that fewer touched rows would result in less time for query execution.
new queries cost a little more time than without extra index.
what happens? explain shows “condition index” are being used here.
stop here, it’s not how indexes are introduced.
sometimes, index is not help if the goal was retrieve 100 target rows. the worst case, yes, but not all.
let’s put a “order by” to see the magic
test-4
select SQL_NO_CACHE * from sample.points_partition where created_time > '2024-01-01' order by created_time limit 100;
FROM: explain
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | points_partition | p25 | range | created_time_index | created_time_index | 5 | 455812 | 100.00 | Using index condition |
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | points_full_table | range | created_time_index | created_time_index | 5 | 2641784 | 100.00 | Using index condition |
FROM: mysqlslap
# partition table
Benchmark
Running for engine innodb
Average number of seconds to run all queries: 0.162 seconds
Minimum number of seconds to run all queries: 0.162 seconds
Maximum number of seconds to run all queries: 0.162 seconds
Number of clients running queries: 10
Average number of queries per client: 10
# full table
Benchmark
Running for engine innodb
Average number of seconds to run all queries: 0.185 seconds
Minimum number of seconds to run all queries: 0.185 seconds
Maximum number of seconds to run all queries: 0.185 seconds
Number of clients running queries: 10
Average number of queries per client: 10
same touched rows as no “order by”.
but the time cost of queries are getting really closed.
makes sense “In this circumstance, you’re better off using indexes instead of partitioning”.
after all, there are different types of queries were influenced and Maintenance of PARTITION is also a big thing.
For example: select count() is much slower for partition tables. unless doing a partition count()
more tests?
let’s stop here
table vs (better view)
key/type | partition | normal | partition+order | normal+order | partition+index | normal+index | partition+order+index | normal+order+index |
---|---|---|---|---|---|---|---|---|
diskspace | ~590m | ~540m | ~590m | ~540m | ~750m | ~700m | ~750m | ~700m |
mysqlslap-benchmark | 0.156s | 0.172s | 4.931s | 54.652s | 0.168s | 0.368s | 0.162s | 0.185s |
mysql-explain-touched-rows | 911625 | 9747207 | 911625 | 9747207 | 455812 | 2641784 | 455812 | 2641784 |
index | / | / | / | / | created_time_index | created_time_index | created_time_index | created_time_index |
POINTS BASED ON TEST(mysqlslap & mysql workbench)
- Index works good without partitioning, most of cases even better
- Under condition of range query by partition field, partitioning tables works good indeed
- drop partitions is much more efficient when doing a big delete
- if queries use specific partition, performance will better
Other Points Related & documents & Links:
- Partitioning mainly helps when your full table is larger than RAM
- No partitioning without million rows, Only BY RANGE provides any performance…
- index order(DESC or ASC) is also important
- mysqlslap–benchmark tool
- questions about partition