Syafiq Kamarul Azman Syafiq Kamarul Azman - 4 months ago 13
SQL Question

Are these SQL statements equal?

I have the following table:

| sample_id (varchar, unique) | field1 (int) | field2 (int) | ...
--------------------------------------------------------------------
| 9b7acb476c4ab04c7ddbc | 100 | 56 | ...
| a2e4df67e98ccaf088abf | 23 | NULL | ...
| fcbe9cecd6b96cba7c6ee | NULL | 43 | ...
...


I have the following code created by a prior user to query two fields at the same time and getting a random subset of the rows:

SELECT sample_id, field1, field2
FROM samples
WHERE field1 != NULL
UNION ALL
SELECT sample_id, field1, field2
FROM samples
WHERE field2 != NULL
ORDER BY RAND()
LIMIT 1000


I thought of optimizing the code by rewriting the query as:

SELECT sample_id, field1, field2
FROM samples
WHERE field1 != NULL
OR field2 != NULL
ORDER BY RAND()
LIMIT 1000


Based on some documentation I read here it seems that both the queries are equivalent but I'm not sure how the
ORDER BY RAND()
line would be handled in the query. Is it only applied to the second query (i.e. the query after the
UNION ALL
)?

Answer

[THIS WAS THE ORIGINAL VERSION OF THE QUESTION]

Not at all. != NULL will filter out all data, because almost all comparisons to NULL return NULL, which is treated as false.

!= '' will return all values that do not contain an empty string and are not NULL.

The correct comparisons to NULL use is null and is not null.

[AFTER THE EDIT] The query you want is:

SELECT sample_id, field1, field2
FROM samples
WHERE field1 IS NOT NULL OR field2 IS NOT NULL
ORDER BY RAND()
LIMIT 1000;
Comments