Felix Felix - 12 days ago 6
MySQL Question

Optimization Mysql Query Left Join

We want to map the entries of the calibration_data to the calibration data by following query. But the duration of this query is quite too long in my opinion (>24h).

Is there any optimization possible?
We added for testing more Indexes as needed right now but it didn't had any impact on the duration.

[Edit]

The hardware shouldn't be the biggest bottleneck


  • 128 GB RAM

  • 1TB SSD RAID 5

  • 32 cores



EXPLAIN result

+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+------------------------------------------------+
| 1 | SIMPLE | cal | NULL | ALL | NULL | NULL | NULL | NULL | 2009 | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | m | NULL | ALL | visit | NULL | NULL | NULL | 3082466 | 100.00 | Range checked for each record (index map: 0x1) |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+------------------------------------------------+


Query which takes too long:

Insert into knn_data (SELECT cal.X AS X,
cal.Y AS Y,
cal.BeginTime AS BeginTime,
cal.EndTime AS EndTime,
avg(m.dbm_ant) AS avg_dbm_ant,
m.ant_id AS ant_id,
avg(m.location) avg_location,
count(*) AS count,
m.visit
FROM calibration cal
LEFT join calibration_data m
ON m.visit BETWEEN cal.BeginTime AND cal.EndTime
GROUP BY cal.X,
cal.Y,
cal.BeginTime,
cal. BeaconId,
m.ant_id,
m.macHash,
m.visit;


Table knn_data:

CREATE TABLE `knn_data` (
`X` int(11) NOT NULL,
`Y` int(11) NOT NULL,
`BeginTime` datetime NOT NULL,
`EndTIme` datetime NOT NULL,
`avg_dbm_ant` float DEFAULT NULL,
`ant_id` int(11) NOT NULL,
`avg_location` float DEFAULT NULL,
`count` int(11) DEFAULT NULL,
`visit` datetime NOT NULL,
PRIMARY KEY (`ant_id`,`visit`,`X`,`Y`,`BeginTime`,`EndTIme`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;


Table calibration

BeaconId, X, Y, BeginTime, EndTime
41791, 1698, 3944, 2016-11-12 22:44:00, 2016-11-12 22:49:00


CREATE TABLE `calibration` (
`BeaconId` int(11) DEFAULT NULL,
`X` int(11) DEFAULT NULL,
`Y` int(11) DEFAULT NULL,
`BeginTime` datetime DEFAULT NULL,
`EndTime` datetime DEFAULT NULL,
KEY `x,y` (`X`,`Y`),
KEY `x` (`X`),
KEY `y` (`Y`),
KEY `BID` (`BeaconId`),
KEY `beginTime` (`BeginTime`),
KEY `x,y,beg,bid` (`X`,`Y`,`BeginTime`,`BeaconId`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;


Table calibration_data

macHash, visit, dbm_ant, ant_id, mac, isRand, posX, posY, sources, ip, dayOfMonth, location, am, ar
'f5:dc:7d:73:2d:e9', '2016-11-12 22:44:00', '-87', '381', 'f5:dc:7d:73:2d:e9', NULL, NULL, NULL, NULL, NULL, '12', '18.077636300207715', 'inradius_41791', NULL


CREATE TABLE `calibration_data` (
`macHash` varchar(100) COLLATE utf8_bin NOT NULL,
`visit` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`dbm_ant` int(3) NOT NULL,
`ant_id` int(11) NOT NULL,
`mac` char(17) COLLATE utf8_bin DEFAULT NULL,
`isRand` tinyint(4) DEFAULT NULL,
`posX` double DEFAULT NULL,
`posY` double DEFAULT NULL,
`sources` int(2) DEFAULT NULL,
`ip` int(10) unsigned DEFAULT NULL,
`dayOfMonth` int(11) DEFAULT NULL,
`location` varchar(80) COLLATE utf8_bin DEFAULT NULL,
`am` varchar(300) COLLATE utf8_bin DEFAULT NULL,
`ar` varchar(300) COLLATE utf8_bin DEFAULT NULL,
KEY `visit` (`visit`),
KEY `macHash` (`macHash`),
KEY `ant, time` (`dbm_ant`,`visit`),
KEY `beacon` (`am`),
KEY `ant_id` (`ant_id`),
KEY `ant,mH,visit` (`ant_id`,`macHash`,`visit`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

Answer

That's a nasty and classical one on "range" queries: the optimiser doesnt use your indexes and end up in a full table scan. In your explain plan ou can see this on column type=ALL.

Ideally you should have type=range and something in the key column

Some ideas:


I doubt that changing you jointure from

ON m.visit BETWEEN cal.BeginTime AND cal.EndTime 

to

ON m.visit >= cal.BeginTime AND m.visit <= cal.EndTime

will work, but still give it a try.


Do trigger an ANALYSE TABLE on both tables. This is will update the stats on your tables and might help the optimiser to take the right decision (ie using the indexes)


Change the query to this might also help to force the optimiser use indexes :

Insert into knn_data (SELECT cal.X           AS X, 
        cal.Y           AS Y, 
        cal.BeginTime   AS BeginTime, 
        cal.EndTime     AS EndTime, 
        avg(m.dbm_ant)  AS avg_dbm_ant, 
        m.ant_id        AS ant_id, 
        avg(m.location) avg_location, 
        count(*)        AS count, 
        m.visit 
 FROM   calibration cal 
        LEFT join calibration_data m
          ON m.visit >= cal.BeginTime 
 WHERE m.visit <= cal.EndTime 
 GROUP  BY cal.X, 
           cal.Y, 
           cal.BeginTime, 
           cal. BeaconId, 
           m.ant_id,
           m.macHash,
           m.visit; 

That's all I am thinking off...

Comments