Elroy Skimms Elroy Skimms - 3 months ago 16
SQL Question

SQL Recursive CTE Linked Chains

I've included a simplified dataset for copy/paste. There would be a table of chained components, linked in a specific order. In production, this table would contain hundreds of thousands of chains, each chain containing 5-10 components.

The input will also be chains and I will search the database for any instance where the database chain has the same components in the same order as the input chains. For example, if the database has chain A, B, C, D, E, F, G and I input chains B and A, B, C and D, F, Z; the results will show me:
Input chain B is a match
Input chain A, B, C is a match
Input chain D, F, Z is not a match

Ideally I'd like to be able to input tens of thousands of chains to search against the hundreds of thousands of chains in the database.

I have a solution in VB.Net that grabs all chains from the database that match the first component of each input chain. It then recursively navigates down the database chain until it reaches the end OR it finds that the database chain does not match. It's not elegant and not efficient as it can only be looking at one input chain at a time. I could just as easily write something in SQL that uses a Cursor to the same effect, but again, not very efficient.

I'm trying to find a recursive CTE that would allow me to run the entire query of thousands of inputs at one time.

Here is a simplified dataset:

DECLARE @Chains TABLE (ID int, Component varchar(15), ParentID int, DisplayOrder int, ChainID int)
DECLARE @SearchChains TABLE (Component varchar(15), DisplayOrder int, GroupID int)

INSERT INTO @Chains (ID, Component, ParentID, DisplayOrder, ChainID) VALUES (1, 'Head bone', 0, 0, 1)
INSERT INTO @Chains (ID, Component, ParentID, DisplayOrder, ChainID) VALUES (2, 'Neck bone', 1, 1, 1)
INSERT INTO @Chains (ID, Component, ParentID, DisplayOrder, ChainID) VALUES (3, 'Shoulder bone', 2, 2, 1)
INSERT INTO @Chains (ID, Component, ParentID, DisplayOrder, ChainID) VALUES (4, 'Back bone', 3, 3, 1)
INSERT INTO @Chains (ID, Component, ParentID, DisplayOrder, ChainID) VALUES (5, 'Hip bone', 4, 4, 1)
INSERT INTO @Chains (ID, Component, ParentID, DisplayOrder, ChainID) VALUES (6, 'Head bone', 0, 0, 2)
INSERT INTO @Chains (ID, Component, ParentID, DisplayOrder, ChainID) VALUES (7, 'Shoulder bone', 6, 1, 2)

INSERT INTO @SearchChains(Component, DisplayOrder, GroupID) VALUES ('Back bone', 0, 1)
INSERT INTO @SearchChains(Component, DisplayOrder, GroupID) VALUES ('Hip bone', 1, 1)
INSERT INTO @SearchChains(Component, DisplayOrder, GroupID) VALUES ('Leg bone', 2, 1)
INSERT INTO @SearchChains(Component, DisplayOrder, GroupID) VALUES ('Head bone', 0, 2)
INSERT INTO @SearchChains(Component, DisplayOrder, GroupID) VALUES ('Neck bone', 1, 2)
;

WITH cteMatching (ID, Component, ParentID, DisplayOrder, ChainID, RecLevel)
AS
(
SELECT C.ID, C.Component, C.ParentID, C.DisplayOrder, C.ChainID, 1 as RecLevel
FROM @Chains C
WHERE DisplayOrder = 0
UNION ALL
SELECT C.ID, C.Component, C.ParentID, C.DisplayOrder, C.ChainID, cte.RecLevel + 1
FROM @Chains C
INNER JOIN cteMatching cte ON C.ParentID = cte.ID
)

SELECT SC.Component, SC.DisplayOrder, SC.GroupID, cte.ID, cte.Component, cte.ParentID,
ISNULL(cte.DisplayOrder, 2147483647) as cteDisplayOrder, cte.ChainID, cte.RecLevel,
ISNULL((SELECT 1 WHERE SC.Component = cte.Component), 0) as IsMatch
FROM @SearchChains SC
LEFT OUTER JOIN cteMatching cte ON SC.Component = cte.Component
ORDER BY SC.GroupID ASC, cte.ChainID ASC, SC.DisplayOrder ASC, cteDisplayOrder ASC


The results are:

Component DisplayOrder GroupID ID Component ParentID cteDisplayOrder ChainID RecLevel IsMatch
--------------- ------------ ----------- ----------- --------------- ----------- --------------- ----------- ----------- -----------
Leg bone 2 1 NULL NULL NULL 2147483647 NULL NULL 0
Back bone 0 1 4 Back bone 3 3 1 4 1
Hip bone 1 1 5 Hip bone 4 4 1 5 1
Head bone 0 2 1 Head bone 0 0 1 1 1
Neck bone 1 2 2 Neck bone 1 1 1 2 1
Head bone 0 2 6 Head bone 0 0 2 1 1


The problem here is that SearchChain GroupID 2 (Neck bone-> Head bone) should show a Match with database ChainID 1 (which it does) and it should also show that the Head components match between GroupID 2 and ChainID 2, Neck does not match Shoulder in ChainID 2. But when I comment out ChainID 1 and SearchChain GroupID 1, the results for ChainID 2 and SearchChain GroupID 2 are correct. This is what has me stumped.

The goal is to be able to run multiple SearchChains through multiple database Chains at the same time, and at the moment, my attempts are failing. Does anyone have any suggestions?

-E

Answer

Besides the fact, that the design of your Chains smells (see below) I want to offer you this solution: Both Chains are concatenated to a path and then the search is performed via LIKE

DECLARE @Chains TABLE (ID int, Component varchar(15), ParentID int, DisplayOrder int, ChainID int)
DECLARE @SearchChains TABLE (Component varchar(15), DisplayOrder int, GroupID int)

INSERT INTO @Chains (ID, Component, ParentID, DisplayOrder, ChainID) VALUES (1, 'Head bone', 0, 0, 1)
INSERT INTO @Chains (ID, Component, ParentID, DisplayOrder, ChainID) VALUES (2, 'Neck bone', 1, 1, 1)
INSERT INTO @Chains (ID, Component, ParentID, DisplayOrder, ChainID) VALUES (3, 'Shoulder bone', 2, 2, 1)
INSERT INTO @Chains (ID, Component, ParentID, DisplayOrder, ChainID) VALUES (4, 'Back bone', 3, 3, 1)
INSERT INTO @Chains (ID, Component, ParentID, DisplayOrder, ChainID) VALUES (5, 'Hip bone', 4, 4, 1)
INSERT INTO @Chains (ID, Component, ParentID, DisplayOrder, ChainID) VALUES (6, 'Head bone', 0, 0, 2)
INSERT INTO @Chains (ID, Component, ParentID, DisplayOrder, ChainID) VALUES (7, 'Shoulder bone', 6, 1, 2)

INSERT INTO @SearchChains(Component, DisplayOrder, GroupID) VALUES ('Back bone', 0, 1)
INSERT INTO @SearchChains(Component, DisplayOrder, GroupID) VALUES ('Hip bone', 1, 1)
INSERT INTO @SearchChains(Component, DisplayOrder, GroupID) VALUES ('Leg bone', 2, 1)
INSERT INTO @SearchChains(Component, DisplayOrder, GroupID) VALUES ('Head bone', 0, 2)
INSERT INTO @SearchChains(Component, DisplayOrder, GroupID) VALUES ('Neck bone', 1, 2);

--This is your query

WITH recChains AS
(
    SELECT c.ChainID,c.ID,c.ParentID,CAST('|' + c.Component AS NVARCHAR(MAX)) AS ChainAsPath,1 AS Step
    FROM @Chains AS c
    WHERE c.ParentID=0
    UNION ALL
    SELECT nxt.ChainID,nxt.ID,nxt.ParentID,prv.ChainAsPath + '|' + nxt.Component,prv.Step+1
    FROM recChains AS prv
    INNER JOIN @Chains AS nxt ON prv.ChainID=nxt.ChainID AND prv.ID=nxt.ParentID
)
,ChainsPath AS
(
    SELECT ChainID,ChainAsPath
    FROM
    (
        SELECT ChainAsPath
              ,ChainID
              ,ROW_NUMBER() OVER(PARTITION BY ChainID ORDER BY Step DESC) AS StepOrderRev
        FROM recChains
    ) AS tbl
    WHERE StepOrderRev=1
)
,SearchPath AS
(
    SELECT GroupID
          ,(
            SELECT CAST('|' + Component AS NVARCHAR(MAX)) 
            FROM @SearchChains AS p
            WHERE p.GroupID=sc.GroupID
            FOR XML PATH('')
           ) AS SearchPath
    FROM @SearchChains AS sc
    GROUP BY GroupID
)

SELECT *
      ,CASE WHEN ChainAsPath LIKE '%' + SearchPath + '%' THEN 'X' ELSE '' END AS Match 
FROM ChainsPath
CROSS JOIN SearchPath

I must admit, that I did not fully understand your expected result, hope this is what you need:

+---------+-------------------------------------------------------+---------+------------------------------+-------+
| ChainID | ChainAsPath                                           | GroupID | SearchPath                   | Match |
+---------+-------------------------------------------------------+---------+------------------------------+-------+
| 1       | |Head bone|Neck bone|Shoulder bone|Back bone|Hip bone | 1       | |Back bone|Hip bone|Leg bone |       |
+---------+-------------------------------------------------------+---------+------------------------------+-------+
| 2       | |Head bone|Shoulder bone                              | 1       | |Back bone|Hip bone|Leg bone |       |
+---------+-------------------------------------------------------+---------+------------------------------+-------+
| 1       | |Head bone|Neck bone|Shoulder bone|Back bone|Hip bone | 2       | |Head bone|Neck bone         | X     |
+---------+-------------------------------------------------------+---------+------------------------------+-------+
| 2       | |Head bone|Shoulder bone                              | 2       | |Head bone|Neck bone         |       |
+---------+-------------------------------------------------------+---------+------------------------------+-------+

Design

The pattern of parentID pointing to another row of the same set (self-reference) is clumsy... You choose this for trees and paths and - in general - graphs. It must be gap-less. A plain chain is much easier configured as a list with a ranking. You might use one column for the display rank and another for the technical chain. Like you do it with your search chains... ORDER BY is faster and cleaner (and allows gaps), than a recursive CTE, which is a hidden RBAR and therefore rather slow...

Design 2

I'd advise you not to use the words itself for the path, but rather use a catalog table for your components and build the path from their IDs. This is cleaner, better to maintain and faster.