Sticky Sticky - 3 months ago 10
MySQL Question

Difference between these two SQL queries

So I'm testing 2 queries and I'm getting different results. I want to correct/patch up my understanding. Here's two generic SQL queries that to my understanding are the same but when executed are not. Note this is not a question about diff between ANSI and non-ANSI SQL.

Query 1 (using


SELECT * FROM person p LEFT JOIN person_log pl
ON p.person_id = pl.person_id
WHERE pl.person_id IS NULL
AND p.is_active = 1;

Query 2 (using 2 queries):

SELECT * FROM person
WHERE person.is_active = 1
AND person_id NOT IN (SELECT person_id FROM person_log);

To my understanding, both represent this in venn diagram form. Also, is one more efficient than the other? A query on
results vs 2 queries?

EDIT: Changed
in query 1. Thanks to @Justin Samuel for spotting the
error that's causing different results!


Both queries get you the same data.

The second query is the straight-forward way to the problem; get all persons that have no entry in person_log. You can do the same with a NOT EXISTS clause instead of a NOT IN clause. (NOT IN is a bit leaner, but the values you select in the subquery must not be null, for otherwise you see no data at all. I usually perfer IN / NOT IN over EXISTS / NOT EXISTS for their simplicity, but that's a matter of personal preference.

The first query is called an anti join. It is a trick to achieve the same as a NOT EXISTS or NOT IN query on weak database systems that don't implement these methods well. (The reason is that when a new database system is written, the programmers usually put all their effort in joins for they are so important and neglect EXISTS and IN for some time.)

It depends on the DBMS which gets executed fastest, NOT IN, NOT EXISTS or the anti join. The ideal DBMS would get to the same execution plan, no matter which syntax you choose.

The anti join can produce large intermediate results. With a mature DBMS you shouldn't use anti joins for this reason and for mere readability.