stUrb stUrb - 5 months ago 28
SQL Question

Select rows which are not present in other table

I've got two tables:

login_log :: ip | etc.
ip_location :: ip | location | hostname | etc.


I want to get every IP address from
login_log
which doesn't have a row in
ip_location
.

I tried this query but it doesn't work:

SELECT login_log.ip
FROM login_log
WHERE NOT EXIST (SELECT ip_location.ip
FROM ip_location
WHERE login_log.ip = ip_location.ip)



ERROR: syntax error at or near "SELECT"
LINE 3: WHERE NOT EXIST (SELECT ip_location.ip`



I'm also wondering if this query (with adjustments to make it work) is the best performing query for this purpose.

Answer

There are basically 4 techniques for this task, all of them standard SQL.

NOT EXISTS

Often, this is fastest in Postgres.

SELECT ip 
FROM   login_log l 
WHERE  NOT EXISTS (
   SELECT 1              -- it's mostly irrelevant what you put here
   FROM   ip_location i
   WHERE  l.ip = i.ip
   );

Also consider:

LEFT JOIN / IS NULL

Sometimes this is fastest. Often shortest.

SELECT l.ip 
FROM   login_log l 
LEFT   JOIN ip_location i USING (ip)  -- short for: ON i.ip = l.ip
WHERE  i.ip IS NULL;

EXCEPT

Short. Not as easily integrated in more complex queries.

SELECT ip 
FROM   login_log

EXCEPT ALL               -- ALL, to keep duplicate rows and make it faster
SELECT ip
FROM   ip_location;

Note that (per documentation):

duplicates are eliminated unless EXCEPT ALL is used.

Typically, you'll want the ALL keyword. If you don't care, still use it because it makes the query faster.

NOT IN

Only good for small sets without NULL values or if you know to handle NULL properly! I would not use it for this purpose. Performance deteriorates with bigger tables.

SELECT ip 
FROM   login_log
WHERE  ip NOT IN (
   SELECT DISTINCT ip  -- DISTINCT is optional
   FROM   ip_location
   );

NOT IN carries a "trap" for NULL values on either side:

Similar question on dba.SE targeted at MySQL: