Mahdi Jadaliha - 2 months ago 12
R Question

# Combine several dcast data.table (which share same key) efficiently

Here is the simple problem I'm trying to solve: I have a data.table like following table, and I'm trying to use

`dcast.data.table`
function to calculate number of advancement for each group, but also I'm interested to calculate
`median`

``````set.seed(10);
DT = data.table(GROUP = sample(c("a","b","c"),100,replace = T),

95:     b     DROP     6
97:     a     DROP    10
99:     c     DROP     6
100:     a     DROP     2
``````

Essentially here is the result I'm looking for:

``````result = merge(
dcast.data.table(DT,.Primitive("~")(GROUP,.),
fun.aggregate=median));

1:     a       17   19            6
2:     b       20   21            7
3:     c       13   10            6
``````

Now I'm wondering how can I do it without making two separate dcast tables and merge at the end. I'm dealing with many row and column in my tables and grouping by key is a bottleneck. I'm wondering is there a better way to calculate this?

** Since my first question was vague I edit completely (thanks to Frank and Akrun for their feedback).

For the updated question

``````setnames(dcast(DT, GROUP~ADVANCED, length)[dcast(DT, GROUP~., median),
Or a faster approach would be to group by 'GROUP', get the `median` of 'GRADE' and then do the join `on` the `dcast` output
``````DT[,.(MEDIAN_GRADE = median(GRADE)) , .(GROUP)][