SimonSimCity SimonSimCity - 4 months ago 14
SQL Question

How to add sequence number for groups (new number if same group occurs again) in an SQL query

I have a question quite similar to How to add sequence number for groups in a SQL query without temp tables

To take the sample from there, I have a data-structure that consists of the first two columns of this table, and I want to generate the third column in my query:


Record Group GroupSequence
-------|---------|--------------
1 Chickens 1
2 Chickens 1
3 Horses 2
4 Cows 3
5 Horses 4
6 Horses 4


The difference to the quoted query is, that I need a new number if the previous line didn't have the same value in the second column.

I tried adding a row-number and using
LAG()
to check if the previous line had the same value - if so, take the previous value of this generated column, if not, take the row-number - but it looks like you can't reuse the column you're building.

The value just needs to be a different number - it doesn't matter if it's in order. This would also be fine:


Record Group GroupSequence
-------|---------|--------------
1 Chickens 1
2 Chickens 1
3 Horses 3
4 Cows 4
5 Horses 5
6 Horses 5

Answer

I think this is what you need:

WITH Src AS
(
    SELECT * FROM (VALUES 
    (1, 'Chickens'),
    (2, 'Chickens'),
    (3, 'Horses  '),
    (4, 'Cows    '),
    (5, 'Horses  '),
    (6, 'Horses  '))T(Record, [Group])
), Differentiator AS
(
    SELECT *,
    ROW_NUMBER() OVER (ORDER BY Record) -
        RANK() OVER (PARTITION BY [Group] ORDER BY Record) Diff
    FROM Src
)
SELECT Record, [Group], DENSE_RANK() OVER (ORDER BY [Group],Diff) NewGroup
FROM Differentiator
ORDER BY Record

It produces following table:

Record   Group      NewGroup
------   -----      --------
1        Chickens   1
2        Chickens   1
3        Horses     3
4        Cows       2
5        Horses     4
6        Horses     4

Short explanation:

The key is to calculate relative positions of records in whole table and in each '[Group]' group. If records are adjacent, global number is increasing by 1 and local number is increased by 1. Thus, ROW_NUMBER() - RANK() is the same for all records. If there is a gap, there is also distortion in global numbering. It leads to different numbers generated by ROW_NUMBER() - RANK() in separated groups.