zfm - 6 months ago 14

MySQL Question

I hava two very similar queries, however they return different result.

The first one:

`select * from products p`

where p.val >= 999999 and

not exists

(select * from products p2 where p2.val < 999999 and p.user_id = p2.user_id);

The second one:

`select * from products p`

where p.val >= 999999 and

p.user_id not in (select user_id from products p2 where p2.val < 999999);

The first one gave me the right answer, while the second one gave me no (zero) result. Is it possible that this happened because the subquery in the second query gave too many results?

Answer

Beware of nulls!

If there's a NULL in the sub-query, the NOT IN will not work as most people expect.

The issue will be clearer if you translate `NOT IN (...)`

either to `NOT (... OR ...)`

or to `NOT ... AND NOT ...`

and apply the three-valued logic to the resulting expression.

To illustrate this with an example, let's say the condition is `NOT IN (1, 2, NULL)`

and the row being checked has a value of 3.

Using `NOT (... OR ...)`

you get this:

```
NOT (3=1 OR 3=2 OR 3=NULL)
```

The first two conditions in the brackets are false the last one is unknown. Based on the three-valued logic, the result of the disjunction will be unknown. Inversion of an unknown is also unknown, according to that same logic. The result of unknown in a WHERE clause is treated same as the result of false, i.e. a no-match. So, here you are.

Now, if you rewrite the NOT IN with `NOT ... AND NOT ...`

, this is what you get:

```
NOT 3=1 AND NOT 3=2 AND NOT 3=NULL
```

The first two terms are true, the last one is unknown (`3=NULL`

is unknown, its inversion is unknown as well). Again, the three-valued logic says the final result is unknown in this case. Here you are again.

So, when a row has a value that is not in the subset but the subset also contains nulls, either do not use NOT IN or filter out the nulls.