Ash Ash - 1 year ago 71
SQL Question

Find and update non duplicated record based on one of the column

I want to find all the non duplicated records and update one of the column.

Ex.

Col_1 | Col_2 | Col_3 | Col_4 | Col_5
A | AA | BB | 1 |
A | AB | BC | 2 |
A | AC | BD | 3 |
B | BB | CC | 1 |
B | BB | CC | 2 |
C | CC | DD | 1 |


My query has to group by Col_1, and I want to find out not unique record based on Col_2 and Col3 and then update the Col_5.

Basically output should be as below,

Col_1 | Col_2 | Col_3 | Col_4 | Col_5
A | AA | BB | 1 | 1
A | AB | BC | 2 | 1
A | AC | BD | 3 | 1
B | BB | CC | 1 | 0
B | BB | CC | 2 | 0
C | CC | DD | 1 | 0


Does anyone have an idea how can I achieve this? This is a large database, so performance is also a key factor.

Thanks heaps,

Answer Source

There are plenty ways to do it. This solution comes from postgres for which I have access to, but I bet it will be working also on tsql as should have common syntax.

;WITH
cte_1 AS (
    SELECT col_1 FROM some_table GROUP BY col_1 HAVING count(*) > 1
),
cte_2 AS (
    SELECT col_1 FROM some_table GROUP BY col_1, col_2, col_3 HAVING count(*) > 1
),
cte_3 AS (
    SELECT cte_1.col_1 FROM cte_1
    LEFT JOIN cte_2 ON cte_1.col_1 = cte_2.col_1
    WHERE cte_2.col_1 IS NULL
)
UPDATE some_table SET col_5 = 1
FROM cte_3 WHERE cte_3.col_1 = some_table.col_1;

So, what happens above?

  1. First we build three CTE semi-tables which allow us to split logic into smaller parts:

    • cte_1 which extracts rows which can have multiple col2 and col_3 rows
    • cte_2 which selects those which have non-unique col_2 and col_3
    • cte_3 which returns those col_1 which have unique col_2 and col_3 just by LEFT JOIN
  2. Using the last cte_3 structure we are able to update some_table correctly

I assume that your table is called some_table here. If you're worring about a performance you should provide some primary key here and also it would be good to have indexes on col_2 and col_3 (standalone but it may help if those would be composite on (col_1, col_2) and so on).

Also you may want to move it from CTE to use temporary tables (which could be also indexed to gain efficiency.

Please also note that this query works fine with your example but without real data it may be just guessing. I mean that what will happen when you would have for col_1=A some unique and non-uniqe col_2 in the same time?

But I believe it's good point to start.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download