Alan Alan - 19 days ago 7
SQL Question

Will a SQL DELETE with a sub query execute inefficiently if there are many rows in the source table?

I am looking at an application and I found this SQL:

DELETE FROM Phrase
WHERE PhraseId NOT IN(SELECT Id FROM PhraseSource)


The intention of the SQL is to delete rows from Phrase that are not in the PhraseSource table.

The two tables are identical and have the following structure

Id - GUID primary key
...
...
...
Modified int


the
...
columns are about ten columns containing text and numeric data. The
PhraseSource
table may or may not contain more recent rows with a higher number in the Modified column and different text and numeric data.

Can someone tell me will this query execute the SELECT Id from PhraseSource for every row in the Phrase table? If so is there a more efficient way that this could be coded.

Answer

In this case the sub-query could be evaluated for each row if the database system is not smart enough (but in case of MS SQL Server, I suppose it should be able to recognize the fact that you don't need to evaluate the subquery more than once).

Still there is a better solution:

DELETE p 
FROM Phrase p
LEFT JOIN PhraseSource ps ON ps.Id = p.PhraseId
WHERE ps.Id IS NULL

This uses the LEFT JOIN which matches the rows of both tables, but in case there is no match it leaves the ps entry NULL. Now you just check for NULLs on the left side to see which Phrases do not have a match and will delete those.

All types of JOIN statements are very nicely described in this answer.

Here you can see three different approaches for a similar issue compared on MySQL. As @Drammy mentions, to actually see the performance of a given approach, you could see the execution plan on your target database and do performance testing on different approaches of the same problem.

Comments