sfiore sfiore - 1 month ago 6
MySQL Question

How to solve performance GROUP BY issue in MySQL?

I'm building a IoT system for home appliance stuff.

My data table has been created as

mysql> SHOW CREATE TABLE DataM1\G
*************************** 1. row ***************************
Table: DataM1
Create Table: CREATE TABLE `DataM1` (
`sensor_type` text,
`sensor_name` text,
`timestamp` datetime DEFAULT NULL,
`data_type` text,
`massimo` float DEFAULT NULL,
`minimo` float DEFAULT NULL,
KEY `timestamp_id` (`timestamp`) USING BTREE,
KEY `super_index_id` (`timestamp`,`sensor_name`(11),`data_type`(11)) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8


and the query is

SELECT
sensor_type, sensor_name, timestamp, data_type,
MAX(massimo) as massimo, MIN(minimo) as minimo
FROM DataM1
WHERE timestamp >= NOW() - INTERVAL 1 HOUR
GROUP BY timestamp, sensor_type, sensor_name, data_type;


Now, the problem is that when the table reaches 4 million (few days) rows the query takes 50+ seconds.

Edit: EXPLAIN result is as following:

id: 1
select_type: SIMPLE
table: DataM1
partitions: p0,p1,p2,p3,p4,p5,p6
type: range
possible_keys: timestamp_id,super_index_id
key: timestamp_id
key_len: 6
ref: NULL
rows: 1
filtered: 100.00
Extra: Using index condition; Using temporary; Using filesort


Edit: a sample row of reply is:

*************************** 418037. row ***************************
sensor_type: SEN
sensor_name: SEN_N2
timestamp: 2016-10-16 17:28:48
data_type: flow_rate
massimo: 17533.8
minimo: 17533.5


Is there a way to speed up with indexing, sharding and/or partitioning?
Or is better to re-think the table separating the information in different tables? If so, could anyone propose his best practice in such a situation?

Answer
  • Do not use "prefix" indexing such as sensor_name(11); it rarely helps and sometimes hurts.
  • If you sensor name and type, and data_type can't be more than 255 characters, don't use TEXT; instead VARCHAR(...) with some realistic limit.
  • Normalize sensor name and type, and data_type -- I assume they are repeated a lot. ENUM is a reasonable alternative.
  • KEY(timestamp) and KEY(timestamp, ...) are redundant; DROP the former.
  • Your table needs a PRIMARY KEY. If no column (or set of columns) is Unique, then use an AUTO_INCREMENT.
  • Perhaps you don't want to start the GROUP BY with the exact timestamp. Maybe truncate to the hour? For example, CONCAT(LEFT(timestamp, 13), ':xx') would yield something like 2016-10-16 20:xx.
  • The main reason the query is taking a long time is that it is outputing 418K rows. What will you do with that many rows? I see no LIMIT, nor ORDER BY. Will that continue to be the case?
  • Partitioning and sharding will not help the speed any.

Those suggestions will help in various ways. Once you have fixed most of them, we can discuss how to use Summary Tables to get a 10x speedup.

Comments