Christoph P. Christoph P. - 1 month ago 8
R Question

Summing over constant calendar week interval

I am currently trying to aggregate a weekly data to monthly data, which looks like this:

UPS WEEK AP
1111112016 1 385.22
1111112016 2 221.63
1111112016 3 317.47


There are 132 different UPCs and weeks are indicated by 1 - 52. However, they vary across the different UPCs. In total I have 4,027 rows.
I would like to aggregate over a 4 week interval until the next UPC category is reached. I have tried this code:

z = aggregate(x$AP, by=list(x$UPC, cut(x$WEEK, breaks=13, lables = T)), FUN = sum)
colnames(z) = c("UPC", "Month", "AP")
z = z[order(z$UPC),]


I get the following output:

UPC Month AP
1 1111112016 (0.951,4.77] 1098.03
88 1111112016 (4.77,8.54] 1180.03
187 1111112016 (8.54,12.3] 491.18
303 1111112016 (12.3,16.1] 896.31


There are several problems here:
1) The month value is wrong. I would like to have a numerical value. (1 - 12)
2) The first two aggregates are correct, however after that the sums seem sometimes to be correct and sometimes not.

Here is a brief example of how my data looks like:

dput(head(x))
structure(list(UPC = c(1111112016, 1111112016, 1111112016, 1111112016,
1111112016, 1111112016), WEEK = c(1, 2, 3, 4, 5, 6), AP = c(385.22,
221.63, 317.47, 173.71, 269.55, 311.48)), .Names = c("UPC", "WEEK",
"AP"), row.names = c(NA, 6L), class = "data.frame")

Answer

Would something like this work (where data is your dataframe):

require(data.table)                                                                                                                                                   "AP"), row.names = c(NA, 6L), class = "data.frame")
setDT(data)
result <- data[, .(AP=sum(AP, na.rm = T)), by = .(UPC, MONTH = (floor(WEEK/ 4.34) + 1))]
result <- result[order(UPC)]

And the result will be:

        UPC   MONTH    AP
1: 1111112016     1 1098.03
2: 1111112016     2  581.03