Jerry Jerry - 17 days ago 5
MySQL Question

NOT IN query takes too long to respond

I am having a table with around 20000 records. I am running this query

SELECT DISTINCT t.ci_record_id FROM `ip_connection` t WHERE
t.remote_ip NOT IN (
SELECT DISTINCT t1.ipAddress FROM ci_table t1 WHERE
t1.blueprint_id IN (SELECT t2.id FROM blueprints t2 WHERE t2.ci_part=FALSE)
AND t1.archive=FALSE
);


Same query if I run with IN clause it will run within seconds and will return 18000 records. NOT IN query just hangs the system.

Answer

You might be able to use the LEFT JOIN ... IS NULL pattern to give this a boost. Here's something to try.

SELECT DISTINCT t.ci_record_id 
  FROM ip_connection t
  LEFT JOIN (
         SELECT DISTINCT t1.ipAddress 
           FROM ci_table t1 
           JOIN blueprints t2    ON t1.blueprint_id = t2.i2
                                AND t2.ci_part=FALSE
          WHERE t1.archive = FALSE
        ) ta ON t.remote_ip = ta.ipAddress
  WHERE ta.ipAddress IS NULL

What I've done here is factored out your dependent subquery (the one in your NOT IN) clause and made it into an independent subquery, like so.

         SELECT DISTINCT t1.ipAddress 
           FROM ci_table t1 
           JOIN blueprints t2    ON t1.blueprint_id = t2.i2
                                AND t2.ci_part=FALSE
          WHERE t1.archive = FALSE

You should be able to test this subquery independently. It should yield the list of unarchived t1 items corresponding to the list of unarchived blueprint items.

Then, I LEFT JOINed it to your toplevel table, then I looked for NULL items on the right side of the join. Those NULL items correspond to rows on the left side of the join that didn't match the ON clause. That's the independent subquery way of doing NOT IN.

This is likely to help you, because MySQL's query planner is a little bit naive about dependent subqueries and sometimes repeats them until the sun becomes a white dwarf star.

Comments