Chriz Chriz - 1 month ago 4
SQL Question

Efficient method of concatenating multiple strings from multiple linked records into single strings per record chain

I have a table which contains records relating to absence details for an employee; when an absence is updated, instead of the existing row being updated; instead a new row is created and the LinkedRecordId field of the previous record is populated with the Id of the new record, a NULL value in the LinkedRecordId field indicates that the record is the newest in the chain.

I need to be able to retrieve a data set for all absence records with the notes of each parent record combined in one string and then list the other fields from the newest record in the chain.

Here is a cut-down version of the table containing the data:

CREATE TABLE [dbo].[AbsenceData](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Notes] [varchar](max) NULL,
[LinkedRecordId] [int] NULL,
[CreatedAt] [datetime] NULL

)


My problem is performance; my current solution is the use of the following table-valued function which collects and concatenates the strings from all parent records in the chain:

CREATE FUNCTION [dbo].[AbsenceNotesFor](@AbsenceDataId INT)
RETURNS @return TABLE
(
AbsenceDataId INT
,Notes VARCHAR(MAX)
)
AS
BEGIN
DECLARE @notes VARCHAR(MAX)
;WITH AbsenceNotes AS (
SELECT
ad.Id
,ad.Notes
,ad.CreatedAt
FROM
AbsenceData ad WITH (NOLOCK)
WHERE
ad.Id = @absenceDataId
UNION ALL
SELECT
ad.Id
,ad.Notes
,ad.CreatedAt
FROM
AbsenceData ad WITH (NOLOCK)
INNER JOIN AbsenceNotes an ON an.Id = ad.LinkedRecordId
)

SELECT @notes = CONVERT(VARCHAR(11),CreatedAt, 105) + ' ' + CONVERT(VARCHAR(5),CreatedAt, 114)+ CHAR(13)+CHAR(10) + CAST(Notes AS VARCHAR(MAX)) + CHAR(13)+CHAR(10) + CHAR(13)+CHAR(10) + COALESCE(@notes,'')
FROM AbsenceNotes

INSERT INTO @return
SELECT AbsenceDataId = @AbsenceDataId, Notes = @notes

RETURN;
END


Here is a cut-down version of the current implementation:

SELECT
Id
,n.Notes
FROM AbsenceData
CROSS APPLY dbo.AbsenceNotesFor(Id) n
WHERE
LinkedRecordId IS NULL


When running this against a dataset of a few hundred records I can already see a performance issue which seems to be due to the logic within the table valued function.

I'm looking for a more efficient way of doing this, any ideas?

We are using MS SQL Server 2016 Standard

Here is the example on SQL Fiddle:
http://sqlfiddle.com/#!6/b9834

Answer

Table valued functions with multi-statement-syntax are well known for absolutely bad performance. You should avoid if ever possible the TVF-syntax with BEGIN and END.

Much better is the same function as inline or ad-hoc function. Without sample data this is a blind flight, but I think you would get the same result, but much better performance with this:

CREATE FUNCTION [dbo].[AbsenceNotesFor](@AbsenceDataId INT)
RETURNS TABLE
AS
    RETURN
    WITH AbsenceNotes AS (
       SELECT
           ad.Id
          ,ad.Notes
          ,ad.CreatedAt
       FROM
          AbsenceData ad WITH (NOLOCK) 
       WHERE 
          ad.Id = @absenceDataId
       UNION ALL
       SELECT
          ad.Id
          ,ad.Notes
          ,ad.CreatedAt
       FROM
          AbsenceData ad WITH (NOLOCK) 
          INNER JOIN AbsenceNotes an ON an.Id = ad.LinkedRecordId
    )

    SELECT @AbsenceDataId AS AbsenceDataId
         ,(
             REPLACE
             (  
                  STUFF
                  (
                      (
                        (
                            SELECT '|#|'+ '|#|' 
                                        + CONVERT(VARCHAR(11),CreatedAt, 105) + ' ' 
                                        + CONVERT(VARCHAR(5),CreatedAt, 114)
                                        + '|#|' 
                                        + CAST(Notes AS VARCHAR(MAX))
                            FROM AbsenceNotes   
                            FOR XML PATH(''),TYPE
                        ).value('.','nvarchar(max)')
                      ),1,6,''
                  ),'|#|',CHAR(13)+CHAR(10)
            )
          ) AS Notes;

Short explanation:

I did not touch your CTE.

The string concatenation with SELECT @variable=@variable + Something is a very bad performing procedural approach. I replaced this with FOR XML PATH(''). You can find a lot about this if you search for Group concat and Sql-Server...

I replaced the line-breaks with a magic-value (|#|) to avoid later problems.

The STUFF function does nothing else than cut away the 6 characters at the beginning (two times the magic value for the line break)

The REPLACE function changes the magic values back to real line-breaks.

If you want the Notes in descending order (your code looks like this) just add a proper ORDER BY to the inner SELECT.

Comments