bstew85 bstew85 - 3 months ago 8
SQL Question

How to replace incorrect field value with most common value based on identifier column in SQL?

I have a column of addresses and zip codes that are inconsistent for the same premise ID and I'd like to replace the incorrect address with the most commonly used address based on the common premise ID.

For example, the original table may look as follows and I'd like to make the street and zip columns consistent for each premise.

Date | Premise | House_No | Street | Zip
-----------------------------------------------------------
Jan | 43219 | 123 | E Haywood Dr | 31214
Feb | 43219 | 123 | Haywood Dr E | 31214-3291
Mar | 43219 | 123 | E Haywood Dr | 31214
Apr | 43219 | 123 | Haywood Dr E | 31214-3291
May | 43219 | 123 | E Haywood Dr | 31214
Jan | 43111 | 456 | W Simpson Wy | 31202
Feb | 43111 | 456 | W Simpson Wy | 31202
Mar | 43111 | 456 | W Simpson Wy | 31202
Apr | 43111 | 456 | Simpson Wy W | 31202-1022
May | 43111 | 456 | W Simpson Wy | 31202

Answer

Try it with an updateable CTE:

DECLARE @tbl TABLE (Mnth VARCHAR(100),Premise INT, House_No INT,Street VARCHAR(100),Zip VARCHAR(100));
INSERT INTO @tbl VALUES
 ('Jan',43219,123,'E Haywood Dr','31214')
,('Feb',43219,123,'Haywood Dr E','31214-3291')
,('Mar',43219,123,'E Haywood Dr','31214')
,('Apr',43219,123,'Haywood Dr E','31214-3291')
,('May',43219,123,'E Haywood Dr','31214')
,('Jan',43111,456,'W Simpson Wy','31202')
,('Feb',43111,456,'W Simpson Wy','31202')
,('Mar',43111,456,'W Simpson Wy','31202')
,('Apr',43111,456,'Simpson Wy W','31202-1022')
,('May',43111,456,'W Simpson Wy','31202');

--The first CTE just does a grouped count:

WITH Counted AS
(
    SELECT COUNT(Premise) AS [Counter]
          ,Premise
          ,House_No
          ,Street
          ,Zip
    FROM @tbl
    GROUP BY Premise,House_No,Street,Zip
)

--The second CTE finds the line with the highest count

--ATTENTION: If there are more than one with the same count the choice is rather random...

,MostCommon AS
(
    SELECT *
         ,ROW_NUMBER() OVER(PARTITION BY Premise ORDER BY [Counter] DESC) AS MaxCounter        
    FROM Counted
)

--This CTE is the updateable one: You collect the real table data together with the new values

,UpdateableCTE AS
(
    SELECT tbl.*
          ,mc.House_No AS NewHouse_No
          ,mc.Street AS NewStreet
          ,mc.Zip AS NewZip 
    FROM @tbl AS tbl
    INNER JOIN MostCommon AS mc ON mc.MaxCounter=1 AND mc.Premise=tbl.Premise
)

--And finally you set the new values

UPDATE UpdateableCTE SET House_No=NewHouse_No
                        ,Street=NewStreet
                        ,Zip=NewZip;

--Show the result

SELECT * FROM @tbl;