bnussey bnussey - 3 months ago 16
SQL Question

Rails: Delete duplicate records based on multiple columns

In our system we run hourly imports from an external database. Due to an error in the import scripts, there are now some duplicate records.

A duplicate is deemed where any record has the same

:legacy_id
and
:company
.

What code can I run to find and delete these duplicates?

I was playing around with this:

Product.select(:legacy_id,:company).group(:legacy_id,:company).having("count(*) > 1")


It seemed to return some of the duplicates, but I wasn't sure how to delete from there?

Any ideas?

Answer

You can try the following approach:

Product.group_by{|x| [x.legacy_id, x.company]}
       .flat_map{|_,x| x.drop(1)}
       .delete_all

Or pure ActiveRecord:

Product.where.not(id: Product.group(:legacy_id, :company).pluck('min(products.id)'))
       .delete_all

Or pure sql:

delete from products
where id not in ( 
   select min(p.id) from products p group by p.legacy_id, p.company
)
Comments