So I'm testing 2 queries and I'm getting different results. I want to correct/patch up my understanding. Here's two generic SQL queries that to my understanding are the same but when executed are not. Note this is not a question about diff between ANSI and non-ANSI SQL.
Query 1 (using
SELECT * FROM person p LEFT JOIN person_log pl
ON p.person_id = pl.person_id
WHERE pl.person_id IS NULL
AND p.is_active = 1;
SELECT * FROM person
WHERE person.is_active = 1
AND person_id NOT IN (SELECT person_id FROM person_log);
Both queries get you the same data.
The second query is the straight-forward way to the problem; get all persons that have no entry in person_log. You can do the same with a
NOT EXISTS clause instead of a
NOT IN clause. (
NOT IN is a bit leaner, but the values you select in the subquery must not be null, for otherwise you see no data at all. I usually perfer
NOT IN over
NOT EXISTS for their simplicity, but that's a matter of personal preference.
The first query is called an anti join. It is a trick to achieve the same as a
NOT EXISTS or
NOT IN query on weak database systems that don't implement these methods well. (The reason is that when a new database system is written, the programmers usually put all their effort in joins for they are so important and neglect
IN for some time.)
It depends on the DBMS which gets executed fastest,
NOT EXISTS or the anti join. The ideal DBMS would get to the same execution plan, no matter which syntax you choose.
The anti join can produce large intermediate results. With a mature DBMS you shouldn't use anti joins for this reason and for mere readability.