Gabor Gabor -4 years ago 36
SQL Question

group by with "holes" in the middle

Given the following table:

create table #Track (id int identity, vehicle int, station varchar(50), pieces int, distance int)
insert into #Track (vehicle, station, pieces, distance)
values
(1, 'A', 10, 0), (1, 'B', 10, 50), (1, 'C', 11, 23), (1, 'D', 11, 40), (1, 'E', 10, 5)


This is the result I need (notice the change in the pieces field between C and D):

vehicle station_from station_to pieces distance_all
1 A B 10 50
1 C D 11 63
1 E E 10 5


If I do this query:

select A.vehicle,
T1.station station_from,
T2.station station_to,
A.pieces,
A.distance_all
from (
select vehicle,
min(id) min_id,
max(id) max_id,
pieces,
sum(distance) distance_all
from #Track
group
by vehicle,
pieces
) A join #Track T1 on A.min_id = T1.id
join #Track T2 on A.max_id = T2.id


I have a wrong result (the distance_all is correct, but the station from and to aren't. It seems as the vehicle 1 was going from A to E AND then from C to D:

vehicle station_from station_to pieces distance_all
1 A E 10 55
1 C D 11 63


How to achieve the required result without using a cursor (the tables are rather big, several millions of records)

Answer Source

This is a variant of the "gaps and islands" problem. In your case, you can solve it with a difference of row numbers:

select vehicle,
       max(case when seqnum_grp = 1 then station end) as station_from,
       max(case when seqnum_grp = 1 then station end) as station_to,
       pieces,
       sum(pieces) as pieces_all
from (select t.*,
             row_number() over (partition by vehicle, pieces, (seqnum - seqnum_p) order by id) as seqnum_grp
      from (select t.*,
                   row_number() over (partition by vehicle order by id) as seqnum,
                   row_number() over (partition by vehicle, pieces order by id) as seqnum_p
           from #Track t
          ) t
     ) t
group by vehicle, pieces, (seqnum - seqnum_p);

To understand how this works, you need to understand why the difference in row numbers identifies the groups. For that you need to just run the innermost subquery and stare at the results.

This is a bit trickier than most such problems because you want the first and last stations along the way. Hence there is an extra subquery.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download