Kazz Kazz - 1 month ago 6
MySQL Question

MySQL confused about IN (CONST vs UNION vs SELECT FROM (UNION))

Can someone please explain why there is big difference between those queries ?

Results of all of them is exactly same.

Performance of

query 1
: very good,
query 2
: bad,
query 3
: good.

Why in
query 2
select from table
test
(id 1) contain all rows ? And why
possible_keys
not contain
PRIMARY
which is actually used ?

Table:

CREATE TABLE `test` (
`id` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `test` ADD PRIMARY KEY (`id`);


Data:

DROP PROCEDURE IF EXISTS insert1000;
DELIMITER $$
CREATE PROCEDURE insert1000()
BEGIN
SET @i = 1;
WHILE @i < 1000 DO
INSERT INTO `test` VALUES (@i);
SET @i = @i + 1;
END WHILE;
END
$$
DELIMITER ;
CALL insert1000();
DROP PROCEDURE insert1000;


Query 1:

SELECT `id` FROM `test` WHERE `id` IN (2, 3)


Query 1 explanation:

+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
| 1 | SIMPLE | test | range | PRIMARY | PRIMARY | 4 | NULL | 2 | Using where; Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+


Query 2:

SELECT `id` FROM `test` WHERE `id` IN (SELECT 2 UNION SELECT 3)


Query 2 explanation:

+------+--------------------+------------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------------+------------+-------+---------------+---------+---------+------+------+--------------------------+
| 1 | PRIMARY | test | index | NULL | PRIMARY | 4 | NULL | 999 | Using where; Using index |
+------+--------------------+------------+-------+---------------+---------+---------+------+------+--------------------------+
| 2 | DEPENDENT SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+------+--------------------+------------+-------+---------------+---------+---------+------+------+--------------------------+
| 3 | DEPENDENT UNION | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+------+--------------------+------------+-------+---------------+---------+---------+------+------+--------------------------+
| NULL | UNION RESULT | <union2,3> | ALL | NULL | NULL | NULL | NULL | NULL | |
+------+--------------------+------------+-------+---------------+---------+---------+------+------+--------------------------+


Query 3:

SELECT `id` FROM `test` WHERE `id` IN (SELECT * FROM (SELECT 2 UNION SELECT 3) AS `derived`)


Query 3 explanation:

+------+--------------+-------------+--------+---------------+---------+---------+-----------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------+-------------+--------+---------------+---------+---------+-----------+------+--------------------------+
| 1 | PRIMARY | <subquery2> | ALL | distinct_key | NULL | NULL | NULL | 2 | |
+------+--------------+-------------+--------+---------------+---------+---------+-----------+------+--------------------------+
| 1 | PRIMARY | test | eq_ref | PRIMARY | PRIMARY | 4 | derived.2 | 1 | Using where; Using index |
+------+--------------+-------------+--------+---------------+---------+---------+-----------+------+--------------------------+
| 2 | MATERIALIZED | <derived3> | ALL | NULL | NULL | NULL | NULL | 2 | |
+------+--------------+-------------+--------+---------------+---------+---------+-----------+------+--------------------------+
| 3 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+------+--------------+-------------+--------+---------------+---------+---------+-----------+------+--------------------------+
| 4 | UNION | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+------+--------------+-------------+--------+---------------+---------+---------+-----------+------+--------------------------+
| NULL | UNION RESULT | <union3,4> | ALL | NULL | NULL | NULL | NULL | NULL | |
+------+--------------+-------------+--------+---------------+---------+---------+-----------+------+--------------------------+

Answer

The Inner workings of the MySQL optimizer...

While query 2 and query 3 both require a full table scan (can't use the index), their different syntax makes the optimizer use different strategies.

You can see it more clearly(ish) by running EXPLAIN EXTENDED SELECT ... and then running SHOW WARNINGS;.

Here's the extended plan for query 2:

select `test`.`id` AS `id` 
from `test` 
where <in_optimizer>(`test`.`id`,<exists>(select 2 having (<cache>(`test`.`id`) = <ref_null_helper>(2)) 
                                          union 
                                          select 3 having (<cache>(`test`.`id`) = <ref_null_helper>(3))
                                          ))

The optimizer translates IN to EXISTS and then compares the results of 2 queries SELECT 2 and SELECT 3 to the row that is scanned in test.

Here's the extended plan for query 3:

select `test`.`id` AS `id` 
from `test` 
where <in_optimizer>(`test`.`id`,<exists>(select 1 from (select 2 AS `2` union select 3 AS `3`) `derived` where (<cache>(`test`.`id`) = `derived`.`2`)))

You can see that in this case the optimizer is running your original UNION to create a derived table with the values 2 and 3, and then compares this table once to the data it scans in table test.