Noobie Noobie - 26 days ago 7
R Question

dplyr, lubridate : how to aggregate a dataframe by week?

Consider the following example

library(tidyverse)
library(lubridate)
time <- seq(from =ymd("2014-02-24"),to= ymd("2014-03-20"), by="days")
set.seed(123)
values <- sample(seq(from = 20, to = 50, by = 5), size = length(time), replace = TRUE)
df2 <- data_frame(time, values)
df2 <- df2 %>% mutate(day_of_week = wday(time, label = TRUE))

Source: local data frame [25 x 3]

time values day_of_week
<date> <dbl> <fctr>
1 2014-02-24 30 Mon
2 2014-02-25 45 Tues
3 2014-02-26 30 Wed
4 2014-02-27 50 Thurs
5 2014-02-28 50 Fri
6 2014-03-01 20 Sat
7 2014-03-02 35 Sun
8 2014-03-03 50 Mon
9 2014-03-04 35 Tues
10 2014-03-05 35 Wed


I would like to aggregate this dataframe by week.

That is, suppose I define a week as starting on Monday morning and ending on Sunday evening, which we will call a
Monday to Monday
cycle. (importantly, I want to be able to choose other conventions, such as Friday to Friday for instance).

Then, I would simply like to count the mean of
values
for each week.

For instance, in the example above, one would compute the average of
values
between Monday February 24th to Sunday March 2nd, and so on.

How can I do that?

Thanks!

EDIT: thanks to all of you who contributed an idea. Somewhat unusual, I think my late solution is probably more appropriate here. Thanks again!

Answer

Just this once, after some research, I actually think I came up with a better solution that

  • gives the correct aggregation
  • gives the correct labels

Example below for weeks starting on a thursday. The weeks will be labeled by their first day a given cycle.

library(tidyverse)
library(lubridate)
options(tibble.print_min = 30)

time <- seq(from =ymd("2014-02-24"),to= ymd("2014-03-20"), by="days")
set.seed(123)
values <- sample(seq(from = 20, to = 50, by = 5), size = length(time), replace = TRUE)
df2 <- data_frame(time, values)

df2 <- df2 %>% mutate(day_of_week_label = wday(time, label = TRUE),
                      day_of_week = wday(time, label = FALSE))

df2 <- df2 %>% mutate(thursday_cycle = time - ((as.integer(day_of_week) - 5) %% 7),
                      tmp_1 = (as.integer(day_of_week) - 5),
                      tmp_2 = ((as.integer(day_of_week) - 5) %% 7))

which gives

> df2
# A tibble: 25 × 7
         time values day_of_week_label day_of_week thursday_cycle tmp_1 tmp_2
       <date>  <dbl>             <ord>       <dbl>         <date> <dbl> <dbl>
1  2014-02-24     30               Mon           2     2014-02-20    -3     4
2  2014-02-25     45              Tues           3     2014-02-20    -2     5
3  2014-02-26     30               Wed           4     2014-02-20    -1     6
4  2014-02-27     50             Thurs           5     2014-02-27     0     0
5  2014-02-28     50               Fri           6     2014-02-27     1     1
6  2014-03-01     20               Sat           7     2014-02-27     2     2
7  2014-03-02     35               Sun           1     2014-02-27    -4     3
8  2014-03-03     50               Mon           2     2014-02-27    -3     4
9  2014-03-04     35              Tues           3     2014-02-27    -2     5
10 2014-03-05     35               Wed           4     2014-02-27    -1     6
11 2014-03-06     50             Thurs           5     2014-03-06     0     0
12 2014-03-07     35               Fri           6     2014-03-06     1     1
13 2014-03-08     40               Sat           7     2014-03-06     2     2
14 2014-03-09     40               Sun           1     2014-03-06    -4     3
15 2014-03-10     20               Mon           2     2014-03-06    -3     4
16 2014-03-11     50              Tues           3     2014-03-06    -2     5
17 2014-03-12     25               Wed           4     2014-03-06    -1     6
18 2014-03-13     20             Thurs           5     2014-03-13     0     0
19 2014-03-14     30               Fri           6     2014-03-13     1     1
20 2014-03-15     50               Sat           7     2014-03-13     2     2
21 2014-03-16     50               Sun           1     2014-03-13    -4     3
22 2014-03-17     40               Mon           2     2014-03-13    -3     4
23 2014-03-18     40              Tues           3     2014-03-13    -2     5
24 2014-03-19     50               Wed           4     2014-03-13    -1     6
25 2014-03-20     40             Thurs           5     2014-03-20     0     0