Sami Sami - 5 days ago 4
R Question

how to sum up specific cells of a column in a data frame

I have a data frame named BalticRainfallDuration. A small portion of the data frame is shown below.

"TIMESTAMP" "Rainfall" "Duration"
2014-03-19 10:40:00 0.508 0
2014-03-19 10:50:00 1.016 10
2014-03-19 11:00:00 0.254 10
2014-03-24 09:10:00 0.254 7090
2014-03-26 12:40:00 0.254 3090
2014-03-27 11:50:00 0.254 1390
2014-03-27 12:20:00 0.254 30
2014-03-28 14:30:00 0.254 1570
2014-03-28 14:40:00 0.508 10
2014-03-28 14:50:00 0.508 10
2014-03-28 15:00:00 0.254 10
2014-03-28 15:10:00 0.508 10
2014-03-28 15:20:00 0.254 10
2014-03-28 15:40:00 0.254 20
2014-03-29 13:00:00 0.254 1280


For each events of continuous 10 mins rainfall present in the TIMESTAMP, I want to sum up the "Duration" for those corresponding events. The output dataframe "Event_Duration" should be as follows:

"TIMESTAMP" "Rainfall" "Duration" "Duration_sum"
2014-03-19 10:40:00 0.508 0
2014-03-19 10:50:00 1.016 10 20
2014-03-19 11:00:00 0.254 10
2014-03-24 09:10:00 0.254 7090 NA
2014-03-26 12:40:00 0.254 3090 NA
2014-03-27 11:50:00 0.254 1390 NA
2014-03-27 12:20:00 0.254 30 NA
2014-03-28 14:30:00 0.254 1570 NA
2014-03-28 14:40:00 0.508 10
2014-03-28 14:50:00 0.508 10
2014-03-28 15:00:00 0.254 10 50
2014-03-28 15:10:00 0.508 10
2014-03-28 15:20:00 0.254 10
2014-03-28 15:40:00 0.254 20 NA
2014-03-29 13:00:00 0.254 1280 NA


which means that there were 2 continuous 10 mins duration rainfall events. Event 1 is 20 mins and event 2 is 50 mins.

I tried the following code:

Event_Duration<-with(BalticRainfallDuraiton,diff(BalticRainfallDuraiton$TIMESTAMP)==10, sum(BalticRainfallDuraiton$Duration))

Duration_Sum<-data.frame(cbind(BalticRainfallDuration,Event_Duration))


But in the output I am only receiving TRUE or FALSE results rather than the values of duration sums.
I will highly appreciate if someone can help me out by suggesting corrections to my code or provide an alternative code.

Answer

How about this:

library( dplyr )
df <- df %>%
    mutate( grp = ifelse( Duration > 10, 1, 0 ) ) %>%
    mutate( grp = cumsum( grp ) ) %>%
    group_by( grp ) %>%
    mutate( Duration_sum = c( rep( NA, n() - 1 ), sum( Duration ) - Duration[1] ) ) %>%
    ungroup() %>%
    mutate( grp = NULL )

Which gives:

> df
# A tibble: 15 × 4
             TIMESTAMP Rainfall Duration Duration_sum
                <dttm>    <dbl>    <int>        <int>
1  2014-03-19 10:40:00    0.508        0           NA
2  2014-03-19 10:50:00    1.016       10           NA
3  2014-03-19 11:00:00    0.254       10           20
4  2014-03-24 09:10:00    0.254     7090            0
5  2014-03-26 12:40:00    0.254     3090            0
6  2014-03-27 11:50:00    0.254     1390            0
7  2014-03-27 12:20:00    0.254       30            0
8  2014-03-28 14:30:00    0.254     1570           NA
9  2014-03-28 14:40:00    0.508       10           NA
10 2014-03-28 14:50:00    0.508       10           NA
11 2014-03-28 15:00:00    0.254       10           NA
12 2014-03-28 15:10:00    0.508       10           NA
13 2014-03-28 15:20:00    0.254       10           50
14 2014-03-28 15:40:00    0.254       20            0
15 2014-03-29 13:00:00    0.254     1280            0

A few steps in here might not be necessary, depending on specifically how you want the output to look like. Likewise, you may want to get rid of the zero duration values (just an extra mutate step at the end).

Comments