Jaki Jaki - 4 months ago 12
SQL Question

Remove duplicate rows in mysql, retaining only that with the lowest id

I want to delete all rows with the same values in all columns except

id
, except the row with the lowest
id
.

Given this data:

id year file name language download
1 2014 a x h d
2 2014 a c i d
3 2014 a x h d
4 2014 a x h d
5 2015 b y j d
6 2015 b y j d
7 2015 b y j d


Here I want to delete 3 and 4 (leaving 1 - the lowest id), and delete 6, 7 (leaving 5).

"Matching" means both rows have the same 'year', 'file', 'name', 'language' and 'download'.

There might be more then 10 duplicate rows. I want to delete all the rows, but keeping the lowest id row.

Answer

Use mysql's multi-table delete syntax with a self-join:

delete b
from mytable a, mytable b
where b.year = a.year
and b.file = a.file
and b.name = a.name
and b.language = a.language
and b.download = a.download
and b.id > a.id -- this condition picks the higher id(s) for deletion

This should perform quite well, since it's essentially a single join.