MAV - 1 year ago 62

SQL Question

Say I have a data like the following:

`X | 2/2/2000`

X | 2/3/2000

B | 2/4/2000

B | 2/10/2000

B | 2/10/2000

J | 2/11/2000

X | 3/1/2000

I would like to get a dataset like this:

`1 | X | 2/2/2000`

1 | X | 2/3/2000

2 | B | 2/4/2000

2 | B | 2/10/2000

2 | B | 2/10/2000

3 | J | 2/11/2000

4 | X | 3/1/2000

So far everything I have tried has either ended up numbering each change resetting the count on each field value change or in the example leave the last X as 1.

Answer Source

This is a gaps and islands problem. You can use a difference of row numbers:

```
select dense_rank() over (order by col1, seqnum_1 - seqnum_2) as col0,
col1, col2
from (select t.*,
row_number() over (order by col2) as seqnum_1,
row_number() over (partition by col1 order by col2) as seqnum_2
from t
) t;
```

Explaining why this works is a bit cumbersome. If you run the subquery, you will see how the sequence numbers are assigned and why the difference is what you want.