thelatemail - 9 months ago 63

R Question

I'm trying to simplify this

`data.table`

`textvar`

`sum`

`library(data.table)`

dt <- data.table(grpvar=letters[c(1,1,2)], textvar=c("one","two","one"),

numvar=1:3, othernum=2:4)

dt

# grpvar textvar numvar othernum

#1: a one 1 2

#2: a two 2 3

#3: b one 3 4

Now my first thought was to nest

`.SD`

`lapply`

`dt[, c(textvar=textvar[1], .SD[, lapply(.SD, sum), .SDcols=-c("textvar")]), by=grpvar]`

# grpvar textvar numvar othernum

#1: a one 3 5

#2: b one 3 4

Then I thought maybe I could do each grouping separately and join them, but that seems even worse:

`dt[, .(textvar=textvar[1]), by=grpvar][`

dt[, lapply(.SD, sum), by=grpvar, .SDcols=-c("textvar")], on="grpvar"

]

# grpvar textvar numvar othernum

#1: a one 3 5

#2: b one 3 4

Is there a simpler construction that would get around the nesting of

`.SD`

Answer

The `j`

-argument in *data.table* is (deliberately) quite flexible. All we need to remember is that:

As long as

`j`

returns alist, each element of the list will become a column in the resulting data.table.

Using the fact that `c(list, list)`

is a `list`

, we can construct the expression as follows:

```
dt[, c(textvar = textvar[1L], lapply(.SD, sum)), # select/compute all cols necessary
.SDcols = numvar:othernum, # provide .SD's columns
by = grpvar] # group by 'grpvar'
# grpvar textvar numvar othernum
# 1: a one 3 5
# 2: b one 3 4
```

Here, I've not wrapped the first expression with `list()`

since `textvar[1L]`

returns a length=1 vector.. i.e., `identical(c(1, list(2, 3)), c(list(1), list(2,3)))`

is `TRUE`

.

Note that this is only possible from `v1.9.7`

. The bug was just recently fixed in the current development version.

Source (Stackoverflow)