newbie - 10 months ago 73

R Question

I have a really big problem and looping through the data.table to do what I want is too slow, so I am trying to get around looping. Let assume I have a data.table as follows:

`a <- data.table(i = c(1,2,3), j = c(2,2,6), k = list(c("a","b"),c("a","c"),c("b")))`

> a

i j k

1: 1 2 a,b

2: 2 2 a,c

3: 3 6 b

And I want to group based on the values in k. So something like this:

`a[, sum(j), by = k]`

right now I am getting the following error:

`Error in `[.data.table`(a, , sum(i), by = k) :`

The items in the 'by' or 'keyby' list are length (2,2,1). Each must be same length as rows in x or number of rows returned by i (3).

The answer I am looking for is to group first all the rows having "a" in column k and calculate sum(j) and then all rows having "b" and so on. So the desired answer would be:

`k V1`

a 4

b 8

c 2

Any hint how to do it efficiently? I cant melt the column K by repeating the rows since the size of the data.table would be too big for my case.

Answer

I think this might work:

```
a[, .(k = unlist(k)), by=.(i,j)][,sum(j),by=k]
k V1
1: a 4
2: b 8
3: c 2
```

Source (Stackoverflow)