pmorales pmorales - 4 months ago 14
SQL Question

Pick one row for each pair of parent-child and child-parent relationships

I'm stuck with this problem and I could use some help.

I'm working with a table called dependencies.

A simple example

ID, parent, dependent, relationship
1234, John, Mike, Parent
1235, Mike, John, Child
1236, Nancy, John, Spouse
1237, John, Nancy, Spouse
1238, Peter, Mike, Sibling
1239, Mike, Peter, Sibling


Some of this dependencies are "mirror dependencies" (Like 1234 and 1235)


  • John is Parent of Mike

  • Mike is Child of John



The requirement is to retrieve one relationship by each pair of users, meaning
we need to include one record per each pair (John, Mike), (John, Nancy) and (Peter, Mike) (its actually a
person_key
instead of name, so shouldn't be any duplication, but for the sake of the example I use the name)

1234, John, Mike, Parent
1237, John, Nancy, Spouse
1238, Peter, Mike, Sibling


or

1235, Mike, John, Child
1236, Nancy, John, Spouse
1239, Mike, Peter, Sibling


Any idea how to use SQL in order to skip the mirror records from the set?

Answer

Sample data

DECLARE @Dependencies TABLE
([ID] int, [parent] varchar(50), [dependent] varchar(50), [relationship] varchar(50));

INSERT INTO @Dependencies
([ID], [parent], [dependent], [relationship])
VALUES
(1234, 'John', 'Mike', 'Parent'),
(1235, 'Mike', 'John', 'Child'),
(1236, 'Nancy', 'John', 'Spouse'),
(1237, 'John', 'Nancy', 'Spouse'),
(1238, 'Peter', 'Mike', 'Sibling'),
(1239, 'Mike', 'Peter', 'Sibling');

Query

Calculate MIN and MAX of (parent, dependent) and then you can group them together.

SELECT
    ID
    ,CASE WHEN [parent] < [dependent] THEN [parent] ELSE [dependent] END AS MinRelationship
    ,CASE WHEN [parent] > [dependent] THEN [parent] ELSE [dependent] END AS MaxRelationship
    ,[relationship]
FROM @Dependencies
;

Result

+------+-----------------+-----------------+--------------+
|  ID  | MinRelationship | MaxRelationship | relationship |
+------+-----------------+-----------------+--------------+
| 1234 | John            | Mike            | Parent       |
| 1235 | John            | Mike            | Child        |
| 1236 | John            | Nancy           | Spouse       |
| 1237 | John            | Nancy           | Spouse       |
| 1238 | Mike            | Peter           | Sibling      |
| 1239 | Mike            | Peter           | Sibling      |
+------+-----------------+-----------------+--------------+

The rest depends on which row from each pair you want to choose. For example, we may want to choose a row with the smallest ID. CTE_MinMax is the simple query above. CTE_rn adds a number to each row partitioned by the pair and ordered by ID. The final SELECT returns only one row for each pair.

The query will work correctly if there is only one entry (not a pair), or if there are more than 2 entries.

WITH
CTE_MinMax
AS
(
    SELECT
        ID
        ,CASE WHEN [parent] < [dependent] THEN [parent] ELSE [dependent] END AS MinRelationship
        ,CASE WHEN [parent] > [dependent] THEN [parent] ELSE [dependent] END AS MaxRelationship
        ,[relationship]
    FROM @Dependencies
)
,CTE_rn
AS
(
    SELECT
        ID
        ,MinRelationship
        ,MaxRelationship
        ,relationship
        ,ROW_NUMBER() OVER (PARTITION BY MinRelationship, MaxRelationship ORDER BY ID) AS rn
    FROM CTE_MinMax
)
SELECT
    ID
    ,MinRelationship
    ,MaxRelationship
    ,relationship
FROM CTE_rn
WHERE rn = 1
;

Result

+------+-----------------+-----------------+--------------+
|  ID  | MinRelationship | MaxRelationship | relationship |
+------+-----------------+-----------------+--------------+
| 1234 | John            | Mike            | Parent       |
| 1236 | John            | Nancy           | Spouse       |
| 1238 | Mike            | Peter           | Sibling      |
+------+-----------------+-----------------+--------------+
Comments