newbie newbie - 2 months ago 13
R Question

Group a data.table using a column which is list

I have a really big problem and looping through the data.table to do what I want is too slow, so I am trying to get around looping. Let assume I have a data.table as follows:

a <- data.table(i = c(1,2,3), j = c(2,2,6), k = list(c("a","b"),c("a","c"),c("b")))

> a
i j k
1: 1 2 a,b
2: 2 2 a,c
3: 3 6 b

And I want to group based on the values in k. So something like this:

a[, sum(j), by = k]

right now I am getting the following error:

Error in `[.data.table`(a, , sum(i), by = k) :
The items in the 'by' or 'keyby' list are length (2,2,1). Each must be same length as rows in x or number of rows returned by i (3).

The answer I am looking for is to group first all the rows having "a" in column k and calculate sum(j) and then all rows having "b" and so on. So the desired answer would be:

k V1
a 4
b 8
c 2

Any hint how to do it efficiently? I cant melt the column K by repeating the rows since the size of the data.table would be too big for my case.


I think this might work:

a[, .(k = unlist(k)), by=.(i,j)][,sum(j),by=k]

   k V1
1: a  4
2: b  8
3: c  2