D. Stewart D. Stewart - 3 months ago 13
SQL Question

SQL simplifying an except query

I have a database with around 50 million entries showing the status of a device for a given day, simplified to the form:

 id | status
-------------
1 | Off
1 | Off
1 | On
2 | Off
2 | Off
3 | Off
3 | Off
3 | On
...


such that each id is guaranteed to have at least 2 rows with an 'off' status, but doesn't have to have an 'on' status. I'm trying to get a list of only the ids that do not have an 'On' status. For example, in the above data set I'd want a query returned with only '2'

The current query is:

SELECT DISTINCT id FROM table

EXCEPT

SELECT DISTINCT id FROM table WHERE status <> 'Off'


Which seems to work, but it's having to iterate over the entire table twice which ends up taking ~10-12 minutes to run per query. Is there a simpler way to do this with only a single query?

Answer

You can use WHERE NOT EXISTS instead:

Select Distinct Id
From   Table  A
Where Not Exists
(
    Select  *
    From    Table  B
    Where   A.Id = B.Id
    And     B.Status = 'On'
)

I would also recommend looking at the indexes on the Status column. 10-12 minutes to run is excessively long. Even with 50m records, with proper indexing, a query like this shouldn't take longer than a second.

To add an index to the column, you can run this (I'm assuming SQL Server, your syntax may vary):

Create NonClustered Index Ix_YourTable_Status On YourTable (Status Asc);