Edwin Lee Edwin Lee - 4 months ago 25
SQL Question

MySQL - Uneven Distribution of Data into Partitions When Using Key Partitioning


  1. I'm using the InnoDB engine on MySQL 5.7.

  2. I have a table where one of the columns is a (non-unique) three-letter country code (e.g. "SGP" for Singapore, "JPN" for Japan, etc).

  3. For most of my queries, this country code column is the first WHERE clause I filter by (e.g.
    WHERE COUNTRY_CODE = 'SGP'
    )

  4. Hence, I want to (sub-)partition the table by this column. Since most of my queries will be on a single country code, they will only hit one partition in this way.

  5. However, due to the large number of different country codes, I do not want to use LIST partitioning where I have to explicitly cater for every single country code.

  6. So I used KEY partitioning, with 8 partitions. I thought that key partitioning, whereby the value is hashed, would give me a more-or-less even distribution over the 8 partitions (don't have to be perfect).

  7. However, what I experienced is that out of the 8 partitions, 4 of them are completely untouched.



This is an abstract of my CREATE TABLE statement:

CREATE TABLE TBL_EATING_PLACES (
ID INT UNSIGNED AUTO_INCREMENT NOT NULL,
TYPE_OF_FOOD SMALLINT UNSIGNED NOT NULL,
SUBTYPE_OF_FOOD SMALLINT UNSIGNED NOT NULL,
COUNTRY_CODE CHAR(3) NOT NULL,
ADDRESS VARCHAR(255),
...
OTHER_NON_RELEVANT_COLUMNS ...,
...,
CONSTRAINT PKEY PRIMARY KEY (ID, TYPE_OF_FOOD, SUBTYPE_OF_FOOD, COUNTRY_CODE)
)
ENGINE = InnoDB
PARTITION BY LIST COLUMNS(TYPE_OF_FOOD, SUBTYPE_OF_FOOD) SUBPARTITION BY KEY(COUNTRY_CODE) SUBPARTITIONS 8 (
PARTITION P_1_1 VALUES IN ((1, 1)),
PARTITION P_1_2 VALUES IN ((1, 2)),
PARTITION P_2_1 VALUES IN ((2, 1)),
PARTITION P_1_2 VALUES IN ((2, 2)),
PARTITION P_1_3 VALUES IN ((2, 2)),
);


Is there anything wrong with how I'm doing the KEY partitioning such that it ended up only hitting half of the partitions?

Answer

What's wrong with key partitioning? It provides zero benefit. Don't use it. Instead, provide suitable composite indexes that match your queries.

(Added to address questions in comments...)

Usually a composite index can do the equivalent of what partitioning does. The "partition key" does "partition pruning" to pick the one (or few) partitions to look into. By having the the "partition key" as the first column in an index, you get the same effect. (Yes, there are exceptions.)

Partitions have some overhead. Each partition is a file; opening files is costly. In some cases, all partitions are opened before doing the pruning. It used to be that there was no pruning on INSERT. (Yuck!) (Some of these issues have been addressed in newer version, but there is still some overhead.)

I have looked at many examples of subpartitioning and non-RANGE partitioning. I have seen only 4 cases where indexing won't do "as good" as partitioning. I assume you found my blog that lists the 4. Here's one copy: Partition Maintenance.

A 2-dimensional search needs to "reduce the search space". This is one of the 4 cases. RANGE partitioning handles one dimension, the PRIMARY KEY handles the other. That works efficiently (but with messy code) to Find the 10 nearest pizza parlors.

BY RANGE is the only partitioning that can handle a 'range' of values (such as a date range). HASH will simply search all the partitions.

BY LIST may be just as good as BY RANGE, but only for exact values. And then I go back to saying "why not put the partition key on the front of whatever index you would otherwise use"!

I will happily augment my blog if someone can find a 5th use case for which I can't provide equivalent performance without partitioning.

Comments