NBC NBC - 1 month ago 12
R Question

Complicated table calculations [R]

I have a dataframe df_workingFile:

group | id | flag | value |
a | a | 0 | 5 |
a | c | 0 | 2 |
b | b | 0 | 8 |
b | e | 1 | 1 |


I want to sum the value column based on group, but only on the row where group = id (there can only be one of these instances by group). I also don't want to include any values in the sum that have a flag:

group | id | flag | value | ValueSum
a | a | 0 | 5 | 7
a | c | 0 | 2 | 0
b | b | 0 | 8 | 8
b | e | 1 | 1 | 0


Input data:

df_workingFile <- structure(list(group = structure(c(1L, 1L, 2L, 2L), .Label = c("a",
"b"), class = "factor"), id = structure(c(1L, 3L, 2L, 4L), .Label = c("a",
"b", "c", "e"), class = "factor"), flag = c(0, 0, 0, 1), value = c(5,
2, 8, 1)), .Names = c("group", "id", "flag", "value"), row.names = c(NA,
4L), class = "data.frame")

Answer Source

This is quite simple with the data.table package. Here is my solution:

library(data.table)

temp <- data.table(group = c("a","a","b","b"),
           id = c("a","c","b","d"),
           flag = c(0,0,0,1),
           value = c(5,2,8,1))

temp[flag != 1,ValueSum := ifelse(group == id,sum(value), NA), by = group]

Before the first comma, you are able to specify a logical condition. In the middle you can do column assignment (I'm sure there's a cleaner solution in this case, but it works). After the second comma, you can specify a group-by statement. I would highly recommend getting familiar with the package, it has made my life far easier.