codevour codevour - 3 months ago 9
SQL Question

Delete duplicate entries via SQL?

Is there any possibility in SQL to remove (only one) duplicate entries of composed columns (here: city, zip)? So if i have this SQL:

INSERT INTO foo (id, city, zip) VALUES (1, 'New York', '00000')
INSERT INTO foo (id, city, zip) VALUES (2, 'New York', '00000')


Can i remove the first later with a sql statement? My approach doesn't work for that

DELETE FROM foo (id, city, zip)
WHERE id IN
(SELECT id FROM foo GROUP BY id HAVING (COUNT(zip) > 1))

Answer

Adapted from this article. These two solutions are generic, and should work on any reasonable SQL implementation.

Remove duplicates in-place:

DELETE T1
FROM foo T1, foo T2
WHERE (T1.city = T2.city AND foo1.zip=foo2.zip) -- Duplicate rows
   AND T1.id > T2.id;                           -- Delete the one with higher id

Simple, and should work fine for small tables or tables with little duplicates.

Copy distinct records to another table:

CREATE TABLE foo_temp LIKE(foo);
INSERT INTO foo_temp (SELECT distinct city, zip) FORM foo;
TRUNCATE TABLE foo;

If you're lucky enough to have a sequence as your id, simply:

INSERT INTO foo SELECT * FROM foo_temp;
DROP TABLE foo_temp;

A bit more complicated, but extremely efficient for very large tables with lots of duplicates. For these, creating an index for (city, zip) would incredibly improve the query performance.