RegressForward - 10 months ago 57
R Question

# Sum consecutive day values

In keeping with a previous question, imagine I have a data set:

``````Date       rain code
2009-04-01  0.0 0
2009-04-02  0.0 0
2009-04-03  0.0 0
2009-04-04  0.7 1
2009-04-05 54.2 1
2009-04-06  0.0 0
2009-04-07  5.0 1
2009-04-08  9.0 0
2009-04-09  0.0 0
2009-04-10  0.0 0
2009-04-11  0.0 0
2009-04-12  5.3 1
2009-04-13 10.1 1
2009-04-14  6.0 1
2009-04-15  8.7 1
2009-04-16  0.0 0
2009-04-17  0.0 0
2009-04-18  0.0 0
2009-04-19  2.0 0
2009-04-20  3.0 0
2009-04-21  0.0 0
2009-04-22  0.0 0
2009-04-23  0.0 0
2009-04-24  0.0 0
2009-04-25  4.3 1
2009-04-26 42.2 1
2009-04-27 45.6 1
2009-04-28 12.6 1
2009-04-29  6.2 1
2009-04-30  1.0 1

DT = structure(list(Date = structure(c(14335, 14336, 14337, 14338,
14339, 14340, 14341, 14342, 14343, 14344, 14345, 14346, 14347,
14348, 14349, 14350, 14351, 14352, 14353, 14354, 14355, 14356,
14357, 14358, 14359, 14360, 14361, 14362, 14363, 14364), class = "Date"),
rain = c(0, 0, 0, 0.7, 54.2, 0, 5, 9, 0, 0, 0, 5.3, 10.1,
6, 8.7, 0, 0, 0, 2, 3, 0, 0, 0, 0, 4.3, 42.2, 45.6, 12.6,
6.2, 1), code = c(0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 0L,
0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
1L, 1L, 1L, 1L, 1L)), .Names = c("Date", "rain", "code"), row.names = c(NA,
-30L), class = "data.frame")
``````

I am trying to collapse the data set to get the sum of consecutive values of rain when code is 1. I need to have sum of them until the day after the event, inclusive. For example, I want to get sum of rain values from 2009-04-13 to 2009-04-06, and 2009-04-07 to 2009-04-08 separately. So I am trying to find way to define when the code is equal to 1 and the following day inclusive. The final product ought to look like:

``````Date       rain code
2009-04-01  0.0 0
2009-04-02  0.0 0
2009-04-03  0.0 0
2009-04-06  54.9 1
2009-04-08  14.0 1
2009-04-09  0.0 0
2009-04-10  0.0 0
2009-04-11  0.0 0
2009-04-16 30.1 1
2009-04-17  0.0 0
2009-04-18  0.0 0
2009-04-19  2.0 0
2009-04-20  3.0 0
2009-04-21  0.0 0
2009-04-22  0.0 0
2009-04-23  0.0 0
2009-04-24  0.0 0
2009-04-30  111.9 1  (if last entry of data frame)
``````

Any help on the above problem would be greatly appreciated.

Here's one way:

``````library(data.table)
setDT(DT)

res = DT[, .(
Date = Date[.N],
rain = sum(rain),
code = code[1L]
), by=.(g = cumsum(shift(!code, fill=FALSE)))]

res[, g := NULL]

Date  rain code
1: 2009-04-01   0.0    0
2: 2009-04-02   0.0    0
3: 2009-04-03   0.0    0
4: 2009-04-06  54.9    1
5: 2009-04-08  14.0    1
6: 2009-04-09   0.0    0
7: 2009-04-10   0.0    0
8: 2009-04-11   0.0    0
9: 2009-04-16  30.1    1
10: 2009-04-17   0.0    0
11: 2009-04-18   0.0    0
12: 2009-04-19   2.0    0
13: 2009-04-20   3.0    0
14: 2009-04-21   0.0    0
15: 2009-04-22   0.0    0
16: 2009-04-23   0.0    0
17: 2009-04-24   0.0    0
18: 2009-04-30 111.9    1
``````

How it works:

• `shift` is taking the value from the prior row
• When a logical value like `!code` is added up, TRUE/FALSE are treated as 1/0
• `.N` is the last row in the `by=` group

The general syntax is `DT[, j, by]` where `j` is computed using each `by` subset of data.

Source (Stackoverflow)