newbie - 1 year ago 114
R Question

# Group a data.table using a column which is list

I have a really big problem and looping through the data.table to do what I want is too slow, so I am trying to get around looping. Let assume I have a data.table as follows:

``````a <- data.table(i = c(1,2,3), j = c(2,2,6), k = list(c("a","b"),c("a","c"),c("b")))

> a
i j   k
1: 1 2 a,b
2: 2 2 a,c
3: 3 6   b
``````

And I want to group based on the values in k. So something like this:

``````a[, sum(j), by = k]
``````

right now I am getting the following error:

`````` Error in `[.data.table`(a, , sum(i), by = k) :
The items in the 'by' or 'keyby' list are length (2,2,1). Each must be same length as rows in x or number of rows returned by i (3).
``````

The answer I am looking for is to group first all the rows having "a" in column k and calculate sum(j) and then all rows having "b" and so on. So the desired answer would be:

``````k V1
a 4
b 8
c 2
``````

Any hint how to do it efficiently? I cant melt the column K by repeating the rows since the size of the data.table would be too big for my case.

``````a[, .(k = unlist(k)), by=.(i,j)][,sum(j),by=k]