dimebucker91 dimebucker91 - 2 months ago 11
SQL Question

deriving sequence from columns in SQL server

I have data that looks something like:

ID1 ID2 ID3 ID4
123 32 43 123
56 67 56 89
123 56 123 56


which basically describes a sequence starting at ID1 and ending at ID4. What I am interested in is just extracting the pattern, and not the IDs involved. for example, the pattern in the first row would be:

ABCA: since it starts at an ID, goes to a new ID (B), then another new ID (C), and back to the original ID (A).

For the second row it would be : ABAC

and for the third it would be: ABAB.

I am looking for an efficient way to do this in sql server instead of using a massive if statement for each potential case.

Answer

Hmmm. Here is a brute force method:

select 'A' +
       (case when id2 = id1 then 'A' else 'B' end) +
       (case when id3 = id1 then 'A'
             when id3 = id2 then 'B'
             when id2 = id1 then 'B'
             else 'C'
        end) +
       (case when id4 = id1 then 'A'
             when id4 = id2 then 'B'
             when id4 = id3 and id2 = id1 then 'B'
             when id3 = id2 then 'C'
             when id2 = id1 then 'C'
             else 'D'
        end)

This is a bit complicated, but something like this should work.

EDIT:

Here is another method that should work:

select t.*, pattern
from t outer apply(
     (select (max(case when id = 1 then val end) +
              max(case when id = 2 then val end) +
              max(case when id = 3 then val end) +
              max(case when id = 4 then val end)
             ) pattern
      from (select v.*,
                   char(ascii('A' + dense_rank() over (order by minpos) - 1)) as val
            from (select v.*, min(pos) over (partition by id) as minpos
                  from (values(id1, 1), (id2, 2), (id3, 3), (id4, 4)) as v(id, pos)
                 ) v
            ) v
      ) v;

Explaining how this works is quite a challenge. The values() command pivots the data into rows, so the first row ends up like:

id    pos
123     1
 32     2
 43     3
123     4

The next level puts the minimum pos where the value is found:

id    pos    minpos
123     1      1
 32     2      2
 43     3      3
123     4      1

(Note: it is a coincidence that the numbers are sequential.)

Then the dense_rank() turns this into letters:

id    pos    minpos   val
123     1      1       A
 32     2      2       B
 43     3      3       C
123     4      1       A

And the final aggregation puts this into the pattern ABCA.