Naveen Naveen - 4 months ago 8
SQL Question

Normalizing the data from denormalized table

I have data in my table like this

RepID|Role|Status|StartDate |EndDate |
-----|----|------|----------|----------|
10001|R1 |Active|01/01/2015|01/31/2015|
-----|----|------|----------|----------|
10001|R1 |Leavee|02/01/2015|02/12/2015|
-----|----|------|----------|----------|
10001|R1 |Active|02/13/2015|02/28/2015|
-----|----|------|----------|----------|
10001|R2 |Active|03/01/2015|03/18/2015|
-----|----|------|----------|----------|
10001|R2 |Leave |03/19/2015|04/10/2015|
-----|----|------|----------|----------|
10001|R2 |Active|04/11/2015|05/10/2015|
-----|----|------|----------|----------|
10001|R1 |Active|05/11/2015|06/13/2015|
-----|----|------|----------|----------|
10001|R1 |Leave |06/14/2015|12/31/9998|
-----|----|------|----------|----------|


I am looking for the output like this,

RepID|Role|StartDate |EndDate |
-----|----|----------|----------|
10001|R1 |01/01/2015|02/28/2015|
-----|----|----------|----------|
10001|R2 |03/01/2015|05/10/2015|
-----|----|----------|----------|
10001|R1 |05/11/2015|12/31/9998|
-----|----|----------|----------|


Whenever only the role change happens, I need to capture the start and EndDate. I tried different ways but couldn't get the output.

Any help is appreciated.

Below is the SQL i tried with but it doesnt help,

SELECT T1.RepID, T1.Role, Min(T1.StartDate) AS StartDate, Max(T1.EndDate) AS EndDate
FROM
(SELECT rD1.RepID, rD1.Role, rD1.StartDate, rD1.EndDate
FROM repDetails rD1
INNER JOIN repDetails rD2
ON rD2.RepID = rD1.RepID AND rD2.StartDate = DateAdd (Day, 1, rD1.EndDate) AND (rD2.Role = rD1.Role OR (rD2.Role IS NULL AND rD1.Role IS NULL) OR (rD2.Role = '' AND rD1.Role = ''))

UNION

SELECT rD2.RepID, rD2.Role, rD2.StartDate, rD2.EndDate
FROM repDetails rD1
INNER JOIN repDetails rD2
ON rD2.RepID = rD1.RepID AND rD2.StartDate = DateAdd (Day, 1, rD1.EndDate) AND (rD2.Role = rD1.Role OR (rD2.Role IS NULL AND rD1.Role IS NULL) OR (rD2.Role = '' AND rD1.Role = ''))
) T1
GROUP BY T1.RepID, T1.Role

UNION

SELECT EP.RepID, EP.Role AS DataValue, EP.StartDate, EP.EndDate
FROM repDetails EP
LEFT OUTER JOIN
(SELECT rD1.RepID, rD1.Role, rD1.StartDate, rD1.EndDate
FROM repDetails rD1
INNER JOIN repDetails rD2
ON rD2.RepID = rD1.RepID AND rD2.StartDate = DateAdd (Day, 1, rD1.EndDate) AND (rD2.Role = rD1.Role OR (rD2.Role IS NULL AND rD1.Role IS NULL) OR (rD2.Role = '' AND rD1.Role = ''))

UNION

SELECT rD2.RepID, rD2.Role , rD2.StartDate, rD2.EndDate
FROM repDetails rD1
INNER JOIN repDetails rD2
ON rD2.RepID = rD1.RepID AND rD2.StartDate = DateAdd (Day, 1, rD1.EndDate) AND (rD2.Role = rD1.Role OR (rD2.Role IS NULL AND rD1.Role IS NULL) OR (rD2.Role = '' AND rD1.Role = ''))
) T1
ON EP.RepID = T1.RepID AND EP.StartDate = T1.StartDate
WHERE T1.RepID IS NULL

vkp vkp
Answer

The key here is to identify continuous rows until the role changes. This can be done by comparing the next row's role using the lead function and some additional logic to categorize all the previous rows into the same group.

After classifying them into groups, you just need to use min and max to get the start and end dates.

with groups as (
select x.*
,case when grp = 1 then 0 else 1 end + sum(grp) over(partition by repid order by startdate) grps
from (select t.*
      ,case when lead(role) over(partition by repid order by startdate) = role then 0 else 1 end grp
      from t) x
)
select distinct repid,role
,min(startdate) over(partition by repid,grps) startdt
,max(enddate) over(partition by repid,grps) enddt
from groups
order by 1,3

Sample demo