Webeng Webeng - 7 months ago 12
SQL Question

Understanding this SQL LEFT JOIN (with IS NULL) example

In a database, there is a table called category:

CREATE TABLE category(
category_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(20) NOT NULL,
parent INT DEFAULT NULL
);


It was created to make an Adjacent List Model Tree. Here is what is inside the table currently:

enter image description here

In this example (you can find the source link at the bottom), the following sql code was used to obtain of the "leaf" elements in the table, with "leaf" elements being the rows in the table that did not have any other row use their "category_id" in the "parent" column. The following code was run:

SELECT t1.name FROM
category AS t1 LEFT JOIN category as t2
ON t1.category_id = t2.parent
WHERE t2.category_id IS NULL;


The result of the previous SQL code gives this result:

enter image description here

For example, there is no row in the table that has the value 3 inside the parent column, hence TUBE (with category_id == 3) is a "leaf" element.

QUESTION: Why did that sql code logically give this result? I'm happy that it does since it is what I needed, but I can't wrap my head around the reasoning behind it.




source of the example: http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/

Answer

Whenever you see this pattern:

SELECT    ...
FROM      TableA
LEFT JOIN TableB ON TableA.column = TableB.column
WHERE     TableB.id is null

Think: "find rows in TableA that do not exist in TableB`.

Your query means "find categories that have no child". It's easier to understand if you remove the WHERE clause and add a few more columns to the query:

SELECT      t1.category_id,
            t1.name,
            t2.name  AS ChildName
FROM        category AS t1
LEFT JOIN   category as t2 ON t1.category_id = t2.parent

Here's what happens in your query:

  1. Start with the category table, alias it as t1 and t2. I'll refer to the aliases from now on.
  2. For each row in t1, find all records in t2 that identifies that t1 row as their parent
  3. If a t1 row has no child, t2.category_id will be null
  4. We want to filter only for t1 rows that have no child