Alan Alan - 11 months ago 59
SQL Question

How does a DELETE FROM with a SELECT in the WHERE work?

I am looking at an application and I found this SQL:

WHERE Modified < (SELECT Modified FROM PhraseSource WHERE Id = Phrase.PhraseId)

The intention of the SQL is to delete rows from Phrase where there are more recent rows in the PhraseSource table.

Now I know the tables Phrase and PhraseSource have the same columns and Modified holds the number of seconds since 1970 but I cannot understand how/why this works or what it is doing. When I look at it then it seems like on the left of the < it is just one column and on the right side of the > it would be many rows. Does it even make any sense?

The two tables are identical and have the following structure

Id - GUID primary key
Modified int

columns are about ten columns containing text and numeric data. The
table may or may not contain more recent rows with a higher number in the Modified column and different text and numeric data.

Answer Source

The SELECT statement in parenthesis is a sub-query or nested query.

What happens is that for each row, the Modified column value is compared with the result of the sub-query (which is run once for each of the rows in the Phrase table).

The sub-query has a WHERE statement, so it finds a row that has the same ID as the row from Phrase table that we are currently evaluating and returns the Modified value (which is for a sigle row, actually a single scalar value).

The two Modified values are compared and in case the Phrase's row has been modified before the row in PhraseSource, it is deleted.

As you can see this approach is not efficient, because it requires the database to run a separate query for each of the rows in the Phrase table (although I imagine that some databases might be smart enough to optimize this a little bit).

A better solution

The more efficient solution would be to use INNER JOIN:

DELETE p FROM Phrase p
INNER JOIN PhraseSource ps
  ON p.PhraseId=ps.Id 
WHERE p.Modified < ps.Modified

This should do the exact same thing as your query, but using efficient JOIN mechanism. INNER JOIN uses the ON statement to choose how to "match" rows in two different tables (which is done very efficiently by the DB) and then again compares the Modified values of matching rows.