I have a table like this, it has millions of records:
CREATE TABLE `myTable` (
`DateTime` DATETIME NOT NULL,
`Col1` MEDIUMINT UNSIGNED NOT NULL,
`Col2` MEDIUMINT UNSIGNED NOT NULL,
`Col3` MEDIUMINT UNSIGNED NOT NULL,
`Col4` MEDIUMINT UNSIGNED NOT NULL,
`Event` MEDIUMINT UNSIGNED NOT NULL,
`State` MEDIUMINT UNSIGNED NOT NULL,
PRIMARY KEY (`DateTime`,`Col4`,`Event`,`State`)
FROM_UNIXTIME(UNIX_TIMESTAMP(MIN(`DateTime`))-(UNIX_TIMESTAMP(MIN(`DateTime`)) % (3*60))) as 'Period',
count(*) as 'NumberOfRecords'
`DateTime` > '2016-09-01' and `DateTime` < '2016-09-09'
AND `Col1` IN (3, 6, 11, 14, etc... )
AND `Col2` IN (5 ,25 , 325 , 293, 294, etc.... )
AND `Col3` IN (3 , 9 , 95 , 395 , 435, etc...)
AND `Col4` IN (124, 125, 135, 325, etc...)
UNIX_TIMESTAMP(`DateTime`) DIV (3*60);
CREATE INDEX `myIndex` ON `myTable` ( `DateTime`, `Col`,`Col2`,`Col3`,`Col4 )
This query is very hard to optimize given your five-column table structure, because you're running up to six different range predicates.
A range predicate includes operations
IN(). Basically, any type of search other than
A range predicate potentially matches many values in the column.
An equality predicate matches exactly one value in the column (there may be many rows with that value, but it's one value).
When defining indexes, the columns you put in the index should be columns references in equality comparisons first, then just one column referenced in a range predicate. Any additional columns in the index beyond the first column referenced in a range predicate will not count for doing the lookup.
For example, if you have an index on
(col1, col2, col3), the following conditions:
WHERE col1=123 AND col2 IN (4, 5, 6) AND col3=789
This query can make use of the first two columns of the index. The
col3 will not use the index. The query will examine all rows matched by the first two terms, and evaluate the third term against all those rows one by one.
Whereas the same index will use all three columns for lookup by the following conditions:
WHERE col1=123 AND col2=789 AND col3 IN (4, 5, 6)
That is, equality predicates for the first two columns, and a range predicate for the last column in the index.
When you use EXPLAIN, one of the columns reports the number of bytes of an index entry. In the above example, suppose all three columns are 32-bit integer columns. EXPLAIN for the first query will report that it uses 8 bytes (two integer's worth), and EXPLAIN for second query will report it uses 12 bytes (three integer's worth).
In your case, you have range predicates for all the terms in your conditions. This is not optimizable with a B-Tree index. It could use an index for any one of the columns. So you might create five individual indexes, each on one of the columns, and hope that the optimizer picks the one that narrows down the search most effectively. Or you can use index hints to pick the best index yourself.
There are other types of indexes besides B-tree indexes. A range predicate search over multiple columns may require an R-tree index. So you may find that to really optimize this query, you need to load a copy of the data into Apache Solr or Crate or some other search engine like that.