Oleg Prutz - 1 year ago 57
SQL Question

# How SQL LEFT JOINs with compositie join conditions work

Suppose I want to select the rows holding the group-wise maximum of a column like described in this article http://dev.mysql.com/doc/refman/5.7/en/example-maximum-column-group-row.html . It does solve the problem. However I don't understand how the last solution works. If I do all the steps described, I get the desired result:

``````CREATE TABLE shop (
article INT(4) UNSIGNED ZEROFILL DEFAULT '0000' NOT NULL,
dealer  CHAR(20)                 DEFAULT ''     NOT NULL,
price   DOUBLE(16,2)             DEFAULT '0.00' NOT NULL,
PRIMARY KEY(article, dealer));

INSERT INTO shop VALUES
(1,'A',3.45),(1,'B',3.99),(2,'A',10.99),(3,'B',1.45),
(3,'C',1.69),(3,'D',1.25),(4,'D',19.95);

SELECT s1.*
FROM shop s1
LEFT JOIN shop s2 ON s1.article = s2.article AND s1.price < s2.price
WHERE s2.price IS NULL;

+---------+--------+-------+
| article | dealer | price |
+---------+--------+-------+
|    0001 | A      |  3.45 |
|    0001 | B      |  3.99 |
|    0002 | A      | 10.99 |
|    0003 | B      |  1.45 |
|    0003 | C      |  1.69 |
|    0003 | D      |  1.25 |
|    0004 | D      | 19.95 |
+---------+--------+-------+
``````

But I don't understand how this very result was obtained using
`LEFT JOIN`
and composite join condition. If I try the same query without
`WHERE`
clause, I get this:

``````SELECT *
FROM shop s1
LEFT JOIN shop s2 ON s1.article = s2.article AND s1.price < s2.price;

+---------+--------+-------+---------+--------+-------+
| article | dealer | price | article | dealer | price |
+---------+--------+-------+---------+--------+-------+
|    0001 | A      |  3.45 |    0001 | B      |  3.99 |
|    0001 | B      |  3.99 |    NULL | NULL   |  NULL |
|    0002 | A      | 10.99 |    NULL | NULL   |  NULL |
|    0003 | B      |  1.45 |    0003 | C      |  1.69 |
|    0003 | C      |  1.69 |    NULL | NULL   |  NULL |
|    0003 | D      |  1.25 |    0003 | B      |  1.45 |
|    0003 | D      |  1.25 |    0003 | C      |  1.69 |
|    0004 | D      | 19.95 |    NULL | NULL   |  NULL |
+---------+--------+-------+---------+--------+-------+
``````

Obviously, the first query just took the rows which have
`NULL`
s which happen to contain the values needed. What is less clear is how certain rows which do not satisfy the join condition get included in the result set and have
`NULL`
s in them while another rows are not included at all.
If I remove
`s1.price < s2.price`
condition the results look like this:

``````SELECT *
FROM shop s1
LEFT JOIN shop s2 ON s1.article = s2.article;

+---------+--------+-------+---------+--------+-------+
| article | dealer | price | article | dealer | price |
+---------+--------+-------+---------+--------+-------+
|    0001 | A      |  3.45 |    0001 | A      |  3.45 |
|    0001 | A      |  3.45 |    0001 | B      |  3.99 |
|    0001 | B      |  3.99 |    0001 | A      |  3.45 |
|    0001 | B      |  3.99 |    0001 | B      |  3.99 |
|    0002 | A      | 10.99 |    0002 | A      | 10.99 |
|    0003 | B      |  1.45 |    0003 | B      |  1.45 |
|    0003 | B      |  1.45 |    0003 | C      |  1.69 |
|    0003 | B      |  1.45 |    0003 | D      |  1.25 |
|    0003 | C      |  1.69 |    0003 | B      |  1.45 |
|    0003 | C      |  1.69 |    0003 | C      |  1.69 |
|    0003 | C      |  1.69 |    0003 | D      |  1.25 |
|    0003 | D      |  1.25 |    0003 | B      |  1.45 |
|    0003 | D      |  1.25 |    0003 | C      |  1.69 |
|    0003 | D      |  1.25 |    0003 | D      |  1.25 |
|    0004 | D      | 19.95 |    0004 | D      | 19.95 |
+---------+--------+-------+---------+--------+-------+
``````

As far as I know,
`INNER JOIN`
s work this way: first a cross join is performed, then the results are filtered according to the join condition. For outer joins, I'm not sure. So I cannot understand why, for example, rows like

``````|    0001 | A      |  3.45 |    0001 | A      |  3.45 |
``````

are not included at all in the result set of previous query (i.e. when
`s1.price < s2.price`

``````|    0001 | B      |  3.99 |    0001 | A      |  3.45 |
``````

get included with
`NULL`
s in their second halves. So could anybody explain the behaivior of SQL
`JOIN`
s?

By using left join you always get all the rows from left table (s1 in your case). When ON clause condition matches for particular row from s1 join result will include it as many times as match with s2 will happen After that where condition applies to result set (if any)

You should also keep in mind, that conditional operators like =, > always return false if at least one of arguments is NULL

So row

``````|    0001 | A      |  3.45 |    0001 | A      |  3.45 |
``````

``````|    0001 | B      |  3.99 |    0001 | A      |  3.45 |