Oleg Prutz Oleg Prutz - 2 months ago 6
SQL Question

How SQL LEFT JOINs with compositie join conditions work

Suppose I want to select the rows holding the group-wise maximum of a column like described in this article http://dev.mysql.com/doc/refman/5.7/en/example-maximum-column-group-row.html . It does solve the problem. However I don't understand how the last solution works. If I do all the steps described, I get the desired result:

CREATE TABLE shop (
article INT(4) UNSIGNED ZEROFILL DEFAULT '0000' NOT NULL,
dealer CHAR(20) DEFAULT '' NOT NULL,
price DOUBLE(16,2) DEFAULT '0.00' NOT NULL,
PRIMARY KEY(article, dealer));

INSERT INTO shop VALUES
(1,'A',3.45),(1,'B',3.99),(2,'A',10.99),(3,'B',1.45),
(3,'C',1.69),(3,'D',1.25),(4,'D',19.95);

SELECT s1.*
FROM shop s1
LEFT JOIN shop s2 ON s1.article = s2.article AND s1.price < s2.price
WHERE s2.price IS NULL;

+---------+--------+-------+
| article | dealer | price |
+---------+--------+-------+
| 0001 | A | 3.45 |
| 0001 | B | 3.99 |
| 0002 | A | 10.99 |
| 0003 | B | 1.45 |
| 0003 | C | 1.69 |
| 0003 | D | 1.25 |
| 0004 | D | 19.95 |
+---------+--------+-------+


But I don't understand how this very result was obtained using
LEFT JOIN
and composite join condition. If I try the same query without
WHERE
clause, I get this:

SELECT *
FROM shop s1
LEFT JOIN shop s2 ON s1.article = s2.article AND s1.price < s2.price;

+---------+--------+-------+---------+--------+-------+
| article | dealer | price | article | dealer | price |
+---------+--------+-------+---------+--------+-------+
| 0001 | A | 3.45 | 0001 | B | 3.99 |
| 0001 | B | 3.99 | NULL | NULL | NULL |
| 0002 | A | 10.99 | NULL | NULL | NULL |
| 0003 | B | 1.45 | 0003 | C | 1.69 |
| 0003 | C | 1.69 | NULL | NULL | NULL |
| 0003 | D | 1.25 | 0003 | B | 1.45 |
| 0003 | D | 1.25 | 0003 | C | 1.69 |
| 0004 | D | 19.95 | NULL | NULL | NULL |
+---------+--------+-------+---------+--------+-------+


Obviously, the first query just took the rows which have
NULL
s which happen to contain the values needed. What is less clear is how certain rows which do not satisfy the join condition get included in the result set and have
NULL
s in them while another rows are not included at all.
If I remove
s1.price < s2.price
condition the results look like this:

SELECT *
FROM shop s1
LEFT JOIN shop s2 ON s1.article = s2.article;

+---------+--------+-------+---------+--------+-------+
| article | dealer | price | article | dealer | price |
+---------+--------+-------+---------+--------+-------+
| 0001 | A | 3.45 | 0001 | A | 3.45 |
| 0001 | A | 3.45 | 0001 | B | 3.99 |
| 0001 | B | 3.99 | 0001 | A | 3.45 |
| 0001 | B | 3.99 | 0001 | B | 3.99 |
| 0002 | A | 10.99 | 0002 | A | 10.99 |
| 0003 | B | 1.45 | 0003 | B | 1.45 |
| 0003 | B | 1.45 | 0003 | C | 1.69 |
| 0003 | B | 1.45 | 0003 | D | 1.25 |
| 0003 | C | 1.69 | 0003 | B | 1.45 |
| 0003 | C | 1.69 | 0003 | C | 1.69 |
| 0003 | C | 1.69 | 0003 | D | 1.25 |
| 0003 | D | 1.25 | 0003 | B | 1.45 |
| 0003 | D | 1.25 | 0003 | C | 1.69 |
| 0003 | D | 1.25 | 0003 | D | 1.25 |
| 0004 | D | 19.95 | 0004 | D | 19.95 |
+---------+--------+-------+---------+--------+-------+


As far as I know,
INNER JOIN
s work this way: first a cross join is performed, then the results are filtered according to the join condition. For outer joins, I'm not sure. So I cannot understand why, for example, rows like

| 0001 | A | 3.45 | 0001 | A | 3.45 |


are not included at all in the result set of previous query (i.e. when
s1.price < s2.price
is added) while

| 0001 | B | 3.99 | 0001 | A | 3.45 |


get included with
NULL
s in their second halves. So could anybody explain the behaivior of SQL
JOIN
s?

Answer

By using left join you always get all the rows from left table (s1 in your case). When ON clause condition matches for particular row from s1 join result will include it as many times as match with s2 will happen After that where condition applies to result set (if any)

You should also keep in mind, that conditional operators like =, > always return false if at least one of arguments is NULL

So row

|    0001 | A      |  3.45 |    0001 | A      |  3.45 |

not added because it not matches your clause s1.price < s2.price

as well as row

|    0001 | B      |  3.99 |    0001 | A      |  3.45 |

is added, only when you don't check prices like s1.price < s2.price, joining only using article.

Comments