Mahdi Jadaliha - 1 year ago 44

R Question

Here is the simple problem I'm trying to solve: I have a data.table like following table, and I'm trying to use

`dcast.data.table`

`median`

`set.seed(10);`

DT = data.table(GROUP = sample(c("a","b","c"),100,replace = T),

ADVANCED = sample(c("ADVANCED","DROP"),100,replace = T),

GRADE = sample(1:10,100, replace=T))

GROUP ADVANCED GRADE

1: b ADVANCED 3

2: a ADVANCED 6

3: b ADVANCED 7

4: c ADVANCED 9

95: b DROP 6

96: c ADVANCED 5

97: a DROP 10

98: b ADVANCED 1

99: c DROP 6

100: a DROP 2

GROUP ADVANCED GRADE

Essentially here is the result I'm looking for:

`result = merge(`

dcast.data.table(DT,.Primitive("~")(GROUP,ADVANCED)),

dcast.data.table(DT,.Primitive("~")(GROUP,.),

value.var="GRADE",

fun.aggregate=median));

setnames(result,".","MEDIAN_GRADE")

GROUP ADVANCED DROP MEDIAN_GRADE

1: a 17 19 6

2: b 20 21 7

3: c 13 10 6

Now I'm wondering how can I do it without making two separate dcast tables and merge at the end. I'm dealing with many row and column in my tables and grouping by key is a bottleneck. I'm wondering is there a better way to calculate this?

** Since my first question was vague I edit completely (thanks to Frank and Akrun for their feedback).

Answer Source

For the updated question

```
setnames(dcast(DT, GROUP~ADVANCED, length)[dcast(DT, GROUP~., median),
on = "GROUP"], ".", "MEDIAN_GRADE")[]
# GROUP ADVANCED DROP MEDIAN_GRADE
#1: a 17 19 6
#2: b 20 21 7
#3: c 13 10 6
```

Or a faster approach would be to group by 'GROUP', get the `median`

of 'GRADE' and then do the join `on`

the `dcast`

output

```
DT[,.(MEDIAN_GRADE = median(GRADE)) , .(GROUP)][
dcast(DT, GROUP ~ ADVANCED, length), on = 'GROUP']
```