SCFi SCFi - 7 months ago 11
SQL Question

SQL Server: Grouping heirarchical data

I have a document table - see below for definition

In this table We have a root document which has a

OriginalDocID
of null

Every time a revision is made a new entry is added with the parents
documentID
as the
OriginalDocID


What I am looking to do is to be able to group/partition everything around the Original document that has
OriginalDocID
of null

Each document can have multiple revisions from one point of origin.

meaning we can have

Doc Id 1 -> 2 -> 3
2 -> 8 -> 9
1 -> 4 -> 7
5 -> 10


So what I would hope to see back is all the rows with the root document. appended

I hope this makes sense. For the life of me I can't wrap my head around a sufficient query.

CREATE TABLE [dbo].[Document](
[DocumentID] [int] IDENTITY(1,1) NOT NULL,
[DocumentName] [varchar](max) NOT NULL,
[ContentType] [varchar](50) NULL,
[DocumentText] [varchar](max) NULL,
[DateCreated] [datetime] NULL,
[DocumentTypeId] [int] NULL,
[Note] [varchar](8000) NULL,
[RefID] [int] NULL,
[Version] [int] NULL,
[Active] [bit] NULL,
[OriginalDocID] [int] NULL
)

Answer

You'll need to use a Recursive CTE to do this. That's a query that refers back to itself so it can traverse a hierarchy and gather information as it works it's way down (or up) the levels of that hierarchy.

In your case, something like:

WITH RECURSIVE docCTE AS
(
    /* Recursive Seed */
    SELECT
        cast(null as int) as parentdoc
        documentID,
        0 as depth,
        documentid as originalDocument,
        CAST(null as varchar(100) as docpath
    FROM
        dbo.document
    Where originalDocID IS NULL

    UNION ALL

    /* Recursive Term */
    SELECT
        docCTE.DocumentID as parentdoc,
        document.documentID,
        depth + 1 as depth,
        docCTE.originalDocument,
        docCTE.Path + '>' + document.documentID
    FROM
        docCTE
        INNER JOIN dbo.document on doccte.document = document.originalDocID
    WHERE
        depth <= 15 /*Keep it from cycling in case of bad hierarchy*/

)

SELECT * FROM docCTE;

The recursive CTE is made up of two parts.

  1. The recursive seed, which is what we use to kick of the query. This is all document records where the originalDocID is null.

  2. The recursive term, where we join the table back to the recursive CTE establishing the parent/child relationship.

In your case we capture the documentid in the Recursive Seed as the originalDoc so that we can bring that down through each record found when we start traversing the hierarchy of documents.

These can be a little overwhelming when you get started, but after you write it a few times, it's second nature (and you'll find the really really helpful as you encounter more of this type of data).