Adam Watson Adam Watson - 7 months ago 8
SQL Question

SQL to match duplicates of multiple rows

I've got a table setup something like this:

id | key | val
1 | a | 1
1 | b | 2
2 | a | 3
2 | b | 4


where each id can have many key/value pairs associated with it. I'm trying to write a query that finds cases where 2 id's have a duplicate set of key/value pairs in which I can specify what keys to look for. For example, in the table below:

id | key | val
1 | a | 1 <
1 | b | 2 <
1 | c | 9
2 | a | 3
2 | b | 4
3 | a | 1 <
3 | b | 2 <
3 | c | 5


The query would return the notated rows above if I specified a and b as the keys too look for. So far I have the below query:

select *
from MY_TABLE inner join
(select id
from MY_TABLE
where key = 'a'
group by val
having count(val) > 1) T1 on T1.val = MY_TABLE.val and MY_TABLE.key = 'a'


which works great if I'm only looking for duplicates based on a single key, but not for multiples. Does anyone have advice on how to modify the above query so that I can specify 2 keys instead of just 1?

Edit: Things I've tried

The query below comes close (from Max Sorin's answer):

select a.*
from MY_TABLE a inner join
(select key,val
from MY_TABLE
where key in ('a','b')
group by key,val
having count(val) > 1) b on b.val = a.val and b.key = a.key


but only gives me duplicates regardless of id. To be more specific, I've shown the difference below:

With below table I want this But this query gives me this
id | key | val id | key | val id | key | val
1 | a | 1 1 | a | 1 1 | a | 1
1 | b | 2 1 | b | 2 1 | b | 2
1 | c | 9 3 | a | 1 3 | a | 1
2 | a | 3 3 | b | 2 3 | b | 2
2 | b | 4 4 | b | 2
3 | a | 1 5 | a | 1
3 | b | 2 6 | b | 2
3 | c | 5
4 | b | 2
5 | a | 1
6 | b | 2


Double Edit:

I've been asked for a more specific example, so here it is. Let's say I want to find all objects who have the same values for keys "Name" and "Age". If I had the below table:

id | key | value
1 | Name | 'John' <-- These rows
1 | Age | '25' <--
1 | Job | 'Farmer'
2 | Name | 'Steve'
2 | Age | '30'
3 | Name | 'John' <-- and these rows would be returned
3 | Age | '25' <--
3 | Job | 'Plumber'
4 | Name | 'John' <-- But not either of these rows, because Age is different.
4 | Age | '26' <--

Answer

Test Data

CREATE TABLE #mytable(id INT, [key] VARCHAR(10), value VARCHAR(50))

INSERT INTO #mytable
        ( id, [key], value )
VALUES  
(1  ,'Name', 'John'),
(1  ,'Age', '25'),
(1  ,'c', '9'),
(2  ,'Name', 'Bob'),
(2  ,'Age', '4'),
(3  ,'Name', 'John'),
(3  ,'Age', '25'),
(3  ,'c', '5'),
(4  ,'Name', 'Samuel'),
(4  ,'Age', '25'),
(5  ,'Name', 'John'),
(5  ,'Age', '36')

Pivoted table will hold results of making a single row for each Name-Age pair:

CREATE TABLE #pivoted (id int, Name VARCHAR(50), Age VARCHAR(10)) 
INSERT INTO #pivoted
        ( id, Name, Age )

SELECT pv.id, pv.Name, pv.Age  
    FROM (SELECT id, [key], value
    FROM #mytable
    WHERE [key] IN ('Name', 'Age')) src

    PIVOT (MAX(value) FOR [Key] IN ([Name], [Age])) AS pv

Pivot function throws away unused keys, and turns the vertical list of values belonging to a single ID into columns.

We are looking for all the Rows in this new table that have the same Name-Age pair and not the same ID to locate duplicated people.

SELECT a.*
FROM #pivoted a
JOIN #pivoted b ON b.Age = a.Age AND b.Name = a.Name        
                AND b.id <> a.id

DROP TABLE #mytable
DROP TABLE #pivoted

I think that the schema is designed the way it is to allow the user to add dimensions to the data without adding columns to the table. If I wanted to store John's height I would simply add this row: 1, 'Height', '189cm'

Comments