Austin McCool Austin McCool - 6 months ago 3
SQL Question

SQL - Group rows by date ranges with contiguous values in row

Note: I have seen the answers for
SQL Group by Date Range and
Group rows by contiguous date ranges for groups of values and both mention the use of row_number(). I have not been able to figure this one out even with the help of these answers, and I've finally caved in and come here for help after a full day of contemplation and research.
EDIT: Now that I understand Gordon Linoff's answer, I realize how similar my question was to the others I linked and apologize for the near-duplicate post simply because I didn't understand the answers.

I have an input table containing a history of "results" of an inquiry into a client, which is consolidated into a view by the most recent result for each month of each year. The view looks (schematically) like this:

client_id | month | year | result
1 2 2016 Y
2 3 2016 N
1 4 2016 Y
1 5 2016 N
2 4 2016 N

and so I am trying to output a result view like this:

client_id | start_month | start_year | end_month | end_year | result
1 2 2016 3 2016 Y
2 3 2016 (NULL) (NULL) N
1 5 2016 (NULL) (NULL) N

Reasons why I am struggling:

  1. Most explanations of row_number() and how it applies here aren't getting through to me

  2. I am working with, rather than date values, columns derived from month-year type inquires originally created in another view from DATEPART() calls, and as this is SQL 2008, I can't turn these back into dates (like 4-1-2016 instead of 4,2016) easily with DATEFROMPARTS()


Your data is exactly the same as the data in the other questions -- except for the fact that the values are split into two columns.

To understand what is happening, start with this query:

select t.*,
       row_number() over (partition by client_id order by start_year, start_month) as seqnum_client,
       (start_year * 12 + start_month) as month_counter
from t
order by client_id, start_year, start_Month;

Then take the difference of the last two columns. You will note that these are constant for contiguous periods of time. That becomes the grouping key.

I'll leave you with this version:

select client_id,
       min(start_year * 100 + start_month) as start_yyyymm,
       max(start_year * 100 + start_month) as end_yyyymm,
       max(result) as result
from (select t.*,
           row_number() over (partition by client_id order by start_year, start_month) as seqnum_client,
           (start_year * 12 + start_month as month_counter
     from t
    ) t
group by client_id, (seqnum_client - month_counter)
order by client_id, max(seqnum_client);

Note: I'm not sure about the exactly logic for Result, but max() returns the values in the question.