Peter Bosch Peter Bosch - 28 days ago 5
MySQL Question

Query with LEFT JOIN on large table really slow

The following query takes approximately 12 seconds to execute. I have tried optimizing but was not able to. The table to be joined is pretty large (> 8.000.000 records).

SELECT
p0_.id AS id_0,
p0_.ean AS ean_1,
p0_.brand AS brand_2,
p0_.type AS type_3,
p0_.retail_price AS retail_price_4,
p0_.target_price AS target_price_5,
min(NULLIF(c1_.delivery_price, 0)) AS sclr_6,
COALESCE(((p0_.target_price - min(NULLIF(c1_.delivery_price, 0))) / p0_.target_price * -100), 0) AS sclr_7
FROM product p0_
LEFT JOIN crawl c1_ ON (
c1_.product_ean = p0_.ean AND (
c1_.crawl_date = p0_.last_crawl_date OR
p0_.last_crawl_date IS NULL
)
AND c1_.source_id IN (
SELECT o2_.source_id AS sclr_8
FROM organisation_source o2_
WHERE o2_.organisation_id = 5
)
)
WHERE p0_.organisation_id = 5 GROUP BY p0_.ean


I already tried writing the query in a lot of different ways, but unfortunately did not give me any performance win. If I remove the subquery in the last AND it does not help either.

See below the output of the EXPLAIN statement:

+------+--------------+-------+------+---------------------------------------------------+------------------+---------+------------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------+-------+------+---------------------------------------------------+------------------+---------+------------------------+--------+-------------+
| 1 | PRIMARY | p0_ | ref | uniqueConstraint,IDX_D34A04AD9E6B1585 | uniqueConstraint | 5 | const | 69 | Using where |
| 1 | PRIMARY | c1_ | ref | IDX_product_ean,IDX_crawl_date | IDX_product_ean | 62 | admin_pricev-p.p0_.ean | 468459 | Using where |
| 2 | MATERIALIZED | o2_ | ref | PRIMARY,IDX_DD91A56E9E6B1585,IDX_DD91A56E953C1C61 | PRIMARY | 4 | const | 1 | Using index |
+------+--------------+-------+------+---------------------------------------------------+------------------+---------+------------------------+--------+-------------+


See below the CREATE TABLE statements of the product and crawl tabel:

CREATE TABLE `product` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`organisation_id` int(11) DEFAULT NULL,
`ean` varchar(20) COLLATE utf8_unicode_ci NOT NULL,
`brand` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`type` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`retail_price` decimal(10,2) NOT NULL,
`target_price` decimal(10,2) NOT NULL,
`last_crawl_date` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `uniqueConstraint` (`organisation_id`,`ean`),
KEY `IDX_D34A04AD9E6B1585` (`organisation_id`),
KEY `IDX_target_price` (`target_price`),
KEY `IDX_ean` (`ean`),
KEY `IDX_type` (`type`),
KEY `IDX_last_crawl_date` (`last_crawl_date`),
CONSTRAINT `FK_D34A04AD9E6B1585` FOREIGN KEY (`organisation_id`) REFERENCES `organisation` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=927 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

CREATE TABLE `crawl` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`source_id` int(11) DEFAULT NULL,
`store_id` int(11) DEFAULT NULL,
`product_ean` varchar(20) COLLATE utf8_unicode_ci NOT NULL,
`crawl_date` datetime NOT NULL,
`takeaway_price` decimal(10,2) DEFAULT NULL,
`delivery_price` decimal(10,2) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `IDX_B4E9F1C2953C1C61` (`source_id`),
KEY `IDX_B4E9F1C2B092A811` (`store_id`),
KEY `IDX_product_ean` (`product_ean`),
KEY `IDX_takeaway_price` (`takeaway_price`),
KEY `IDX_crawl_date` (`crawl_date`),
CONSTRAINT `FK_B4E9F1C2953C1C61` FOREIGN KEY (`source_id`) REFERENCES `source` (`id`),
CONSTRAINT `FK_B4E9F1C2B092A811` FOREIGN KEY (`store_id`) REFERENCES `store` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=8606874 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci


Anyone has any idea how to improve the performance of this query? Many thanks! If more information is needed please let me know!

Answer Source

You can probably simplify the query to:

SELECT . . .
FROM product p0_  LEFT JOIN
     crawl c1_
     ON c1_.product_ean = p0_.ean AND 
        c1_.crawl_date = p0_.last_crawl_date AND
        EXISTS (SELECT 1
                FROM organisation_source o2_ 
                WHERE o2_.organisation_id = 5 AND c1_.source_id = o2_.source_id 
               )
WHERE p0_.organisation_id = 5
GROUP BY p0_.ean;

The p0_.last_crawl_date IS NULL is presumably unnecessary. A LEFT JOIN will keep all rows in the first table even when there is a NULL in a comparison. Your logic matches all rows in the second table (that meet the other conditions). That may be what you want, but I am guessing not.

In MySQL, exists is sometimes faster than in, which is why I've rewritten that portion.

For this query, you can speed it up using indexes: product(organisation_id, ean, last_crawl_date), crawl(product_ean, crawl_date, source_id) and organisation_source(source_id, organisation_id).