Jawad Al Shaikh Jawad Al Shaikh - 2 months ago 5
MySQL Question

Get the cars that passed specific cameras

MYSQL/MARIADB Schema and sample data:

CREATE DATABASE IF NOT EXISTS `puzzle` DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_unicode_ci;

USE `puzzle`;

DROP TABLE IF EXISTS `event`;

CREATE TABLE `event` (
`eventId` bigint(20) NOT NULL AUTO_INCREMENT,
`sourceId` bigint(20) NOT NULL COMMENT 'think of source as camera',
`carNumber` varchar(40) NOT NULL COMMENT 'ex: 5849',
`createdOn` datetime DEFAULT NULL,
PRIMARY KEY (`eventId`)
) ENGINE=INNODB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;


INSERT INTO `event` (`eventId`, `sourceId`, `carNumber`, `createdOn`) VALUES
(1, 44, '4456', '2016-09-20 20:24:05'),
(2, 26, '26484', '2016-09-20 20:24:05'),
(3, 5, '4456', '2016-09-20 20:24:06'),
(4, 3, '72704', '2016-09-20 20:24:15'),
(5, 3, '399606', '2016-09-20 20:26:15'),
(6, 5, '4456', '2016-09-20 20:27:25'),
(7, 44, '72704', '2016-09-20 20:29:25'),
(8, 3, '4456', '2016-09-20 20:30:55'),
(9, 44, '26484', '2016-09-20 20:34:55'),
(10, 26, '4456', '2016-09-20 20:35:15'),
(11, 3, '72704', '2016-09-20 20:35:15'),
(12, 3, '399606', '2016-09-20 20:44:35'),
(13, 26, '4456', '2016-09-20 20:49:45');


I want to get CarNumber(s) that have sourceId = 3 AND (26 OR 44) during 20:24 to 20:45. the query need to be fast since the real table contains over 300 million records.

so far below is the maximum i could go with the query (its not even producing valid results)

select * from event e where
e.createdOn > '2016-09-20 20:24:00' and e.createdOn < '2016-09-20 20:45:00'
and e.sourceId IN(3,26,44) group by e.carNumber;


the correct results for the provided data:

carNumber
4456
72704


I am really puzzled and stuck. i tried EXISTS, Joins, sub-query without luck, so I wonder if SQL is able to solve this question or should I use backend coding?

MySQL / MariaDB version in use:

mariadb-5.5.50

mysql-5.5.51

Answer

If you need this to be fast, then the following might work, assuming you have an index on event(createdOn, carNumber, SourceId):

select e.carNumber 
from event e 
where e.createdOn > '2016-09-20 20:24:00' and e.createdOn < '2016-09-20 20:45:00'
group by e.carNumber
having sum(e.sourceId = 3) > 0 and
       sum(e.sourceId IN (26, 44)) > 0;

I would be inclined to change this to:

select e.carNumber 
from event e 
where e.createdOn > '2016-09-20 20:24:00' and e.createdOn < '2016-09-20 20:45:00' and
      e.sourceId in (3, 26, 44)
group by e.carNumber
having sum(e.sourceId = 3) > 0 and
       sum(e.sourceId IN (26, 44)) > 0;

And then for performance, even this:

select carNumber
from ((select carNumber, sourceId
       from event e
       where e.sourceId = 3 and
             e.createdOn > '2016-09-20 20:24:00' and e.createdOn < '2016-09-20 20:45:00'
      ) union all
      (select carNumber, sourceId
       from event e
       where e.sourceId = 26 and
             e.createdOn > '2016-09-20 20:24:00' and e.createdOn < '2016-09-20 20:45:00'
      ) union all
      (select carNumber, sourceId
       from event e
       where e.sourceId = 44 and
             e.createdOn > '2016-09-20 20:24:00' and e.createdOn < '2016-09-20 20:45:00'
      )
     ) e
group by e.carNumber
having sum(e.sourceId = 3) > 0 and
       sum(e.sourceId IN (26, 44)) > 0;

This version can take advantage of an index on event(sourceId, createdOn, carNumber). Each subquery should use this index very effectively, bringing a small'ish amount of data together for the final aggregation.