Jeff Jeff - 5 months ago 18
SQL Question

Oracle SQL ROW_NUMBER() window with conditions based on lag

Using Oracle SQL with select only privileges, I need to provide

ROW_NUMBER
output based on conditions. using a cursor or a loop this would be easy rather simple, but presently I must perform this task using only SQL.

I've been tinkering with a
row_number()
over
clause which I believe is the right way to go, but I'm stuck now.

My current code - or at least a proxy of it:

WITH MYTABLE (FK_ID,FK_NAME,PK_ID,BIN_FLAG,MONTH,YEAR)AS (
SELECT 10000,'VARCHAR DESCRIPTION',75057,1,1,2016 FROM DUAL
UNION SELECT 10000,'VARCHAR DESCRIPTION',75057,1,2,2016 FROM DUAL
UNION SELECT 10000,'VARCHAR DESCRIPTION',75057,1,3,2016 FROM DUAL
UNION SELECT 10000,'VARCHAR DESCRIPTION',75057,0,4,2016 FROM DUAL
UNION SELECT 10000,'VARCHAR DESCRIPTION',75057,1,5,2016 FROM DUAL
UNION SELECT 10000,'VARCHAR DESCRIPTION',75057,0,6,2016 FROM DUAL
UNION SELECT 10000,'VARCHAR DESCRIPTION',75057,0,7,2016 FROM DUAL
UNION SELECT 10000,'VARCHAR DESCRIPTION',75057,1,8,2016 FROM DUAL
UNION SELECT 10000,'VARCHAR DESCRIPTION',75057,0,9,2016 FROM DUAL
UNION SELECT 10000,'VARCHAR DESCRIPTION',75057,0,10,2016 FROM DUAL
UNION SELECT 10000,'VARCHAR DESCRIPTION',75057,1,11,2016 FROM DUAL
UNION SELECT 10000,'VARCHAR DESCRIPTION',75057,0,12,2016 FROM DUAL
UNION SELECT 10000,'VARCHAR DESCRIPTION',75057,0,1,2017 FROM DUAL
UNION SELECT 10000,'VARCHAR DESCRIPTION',75057,0,2,2017 FROM DUAL
UNION SELECT 10000,'VARCHAR DESCRIPTION',75057,0,3,2017 FROM DUAL
UNION SELECT 10000,'VARCHAR DESCRIPTION',75057,0,4,2017 FROM DUAL
UNION SELECT 10000,'VARCHAR DESCRIPTION',75057,0,5,2017 FROM DUAL
UNION SELECT 10000,'VARCHAR DESCRIPTION',75057,0,6,2017 FROM DUAL
UNION SELECT 10000,'VARCHAR DESCRIPTION',75057,0,7,2017 FROM DUAL

)

SELECT
FK_ID
, FK_NAME
, PK_ID
, BIN_FLAG
, ROW_NUMBER() OVER (PARTITION BY FK_ID,PK_ID,BIN_FLAG ORDER BY YEAR,MONTH,FK_ID,PK_ID ASC) NEEDED_CALC
, MONTH
, YEAR
FROM MYTABLE
ORDER BY FK_ID,PK_ID,YEAR,MONTH


This will return a dataset that looks something like this:

FK_ID FK_NAME PK_ID BIN_FLAG NEEDED_CALC MONTH YEAR
10000 VARCHAR DESCRIPTION 75057 1 1 1 2016
10000 VARCHAR DESCRIPTION 75057 1 2 2 2016
10000 VARCHAR DESCRIPTION 75057 1 3 3 2016
10000 VARCHAR DESCRIPTION 75057 0 1 4 2016
10000 VARCHAR DESCRIPTION 75057 1 4 5 2016
10000 VARCHAR DESCRIPTION 75057 0 2 6 2016
10000 VARCHAR DESCRIPTION 75057 0 3 7 2016
10000 VARCHAR DESCRIPTION 75057 1 5 8 2016
10000 VARCHAR DESCRIPTION 75057 0 4 9 2016
10000 VARCHAR DESCRIPTION 75057 0 5 10 2016
10000 VARCHAR DESCRIPTION 75057 1 6 11 2016
10000 VARCHAR DESCRIPTION 75057 0 6 12 2016
10000 VARCHAR DESCRIPTION 75057 0 7 1 2017
10000 VARCHAR DESCRIPTION 75057 0 8 2 2017
10000 VARCHAR DESCRIPTION 75057 0 9 3 2017
10000 VARCHAR DESCRIPTION 75057 0 10 4 2017
10000 VARCHAR DESCRIPTION 75057 0 11 5 2017
10000 VARCHAR DESCRIPTION 75057 0 12 6 2017
10000 VARCHAR DESCRIPTION 75057 0 13 7 2017


What I need is the
NEEDED_CALC
column to reset the calc anytime the prior month's
bin_flag
changes.

So, if
bin_flag = 1
and the prior month's
bin_flag
(identified through a
lag
function) is different, then the counter column in
NEEDED_CALC
will reset and begin again at
1
.

Answer

This is a "gaps and islands" problem. The key is assigning a group identifier to adjacent rows with the same value. There are two easy methods to do this: one is based on lag() and the other is the difference of row_number()s.

The second is a wee bit simpler, only requiring one level of subquery:

select t.*,
       row_number() over (partition by fk_id, bin_flag, seqnum_ym - seqnum_bym
                          order by year, month
                         ) as needed_calc
from (select t.*,
             row_number() over (partition by fk_id order by year, month) as seqnum_ym,
             row_number() over (partition by fk_id order by bin_flag, year, month) as seqnum_bym
      from mytable t
     ) t;

The difference of row numbers is not hard to understand but it does require a conceptual leap. I would suggest that you run the subquery and see what the values are for seqnum_ym and seqnum_bym to understand how it works.