user299791 - 1 year ago 43
R Question

I have a data frame like:

``````> prova
sent weeknumber processed
1  100          1         1
2   23          1         0
3  254          1         1
4  321          2         0
5 1241          2         0
6  323          2         1
7 1221          3         1

structure(list(sent = c(100, 23, 254, 321, 1241, 323, 1221),
weeknumber = c(1, 1, 1, 2, 2, 2, 3), processed = c(1, 0,
1, 0, 0, 1, 1)), .Names = c("sent", "weeknumber", "processed"
), row.names = c(NA, -7L), class = "data.frame")
``````

If I want to extract the number of Sent by week number for rows with processed = 0, I can do:

``````aggregate(prova[prova\$processed==0,]\$sent, by=list(prova[prova\$processed==0,]\$weeknumber), FUN = sum)
Group.1    x
1       1   23
2       2 1562
``````

And if I want to extract the sum of Sent by week number when processed = 1 I do:

``````aggregate(prova[prova\$processed==1,]\$sent, by=list(prova[prova\$processed==1,]\$weeknumber), FUN = sum)
Group.1    x
1       1  354
2       2  323
3       3 1221
``````

However, I'd like to find a way to have always the same result length, i.e. in case of processed=0, something like this:

``````  Group.1    x
1       1   23
2       2 1562
3       3    0  // this is the new row I'd like to add
``````

If I simply pass the whole list of possible week numbers, I get:

``````aggregate(prova[prova\$processed==0,]\$sent, by=list(prova\$weeknumber), FUN = sum)
Error in aggregate.data.frame(as.data.frame(x), ...) :
arguments must have same length
``````

We can use an `if/else` condition with `data.table`. Convert the 'data.frame' to 'data.table' (`setDT(prova)`), grouped by 'weeknumber', `if` there are not `any` 0 values in 'processed', return 0 or `else` get the `sum` of 'sent' where 'processed' is 0.

``````library(data.table)
setDT(prova)[, .(sent = if(!any(processed==0)) 0
else sum(sent[processed==0])), by = weeknumber]
#   weeknumber sent
#1:          1   23
#2:          2 1562
#3:          3    0
``````

But, if we need the `sum` of 'sent' for each value of 'processed' grouped by 'weeknumber', a convenient option is `dcast`

``````dcast(setDT(prova), weeknumber~processed, value.var="sent", sum)
#  weeknumber    0    1
#1:          1   23  354
#2:          2 1562  323
#3:          3    0 1221
``````

Or with `xtabs` from `base R` which also does the `sum` of 'sent' for each of the combinations of 'weeknumber' with 'processed'.

``````xtabs(sent~weeknumber + processed, prova)
``````

If we are using `aggregate`, one option is `merge` the output of `aggregate` with the `unique` set of 'weeknumber' and then replace the `NA` elements in 'sent' to 0.

``````res <- merge(data.frame(weeknumber = unique(prova\$weeknumber)),
aggregate(sent~weeknumber, prova, subset = processed ==0, FUN = sum),
all.x=TRUE)
res\$sent[is.na(res\$sent)] <- 0
res
#   weeknumber sent
#1          1   23
#2          2 1562
#3          3    0
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download