TangoKilo TangoKilo - 4 months ago 9
SQL Question

Eliminating duplicate records in SQL

I have a table called

attribute_value
with the following columns

attribute_id | start_date | value | latest_ind | mod_dtime


The
latest_ind
column can have a value of either 1 or 0.

I basically want to run an update script on this table which finds all the attributes that have a common
start date
and a
latest_ind
equal to one and set the latest ind to zero EXCEPT in the case where the record is the latest one.

I've managed to put together the following SELECT query but I have no idea how I would go about converting it into an update. Any pointers would be appreciated

SELECT av.attribute_id, av.start_date, count(latest_ind), max(mod_dtime)
FROM t_attribute_value av
where latest_ind = 1
group by attribute_id, start_date
having count(latest_ind) > 1

Answer

This is a case where an UPDATE using a CTE comes in handy:

;WITH ToUpdate AS (
   SELECT latest_ind,
          ROW_NUMBER() OVER (PARTITION BY attribute_id, start_date 
                             ORDER BY mod_dtime DESC) AS rn
   FROM attribute_value
   WHERE latest_ind = 1
)
UPDATE ToUpdate
SET latest_ind = 0
WHERE rn > 1

The update operation is propagated to the real table. Hence, in case of a attribute_id, start_date partition with a population greater than one, all records but the lastest are updated.