user3463225 - 1 year ago 125
R Question

# R: merge data.tables to create nested data.table with list for each row

The following code creates two data tables, with the same column id_g which is presented in different ways.

``````table1 <- data.table(id=c(1:3), id_g = vector('list', 3L))
set(table1, j='id_g', value=list(c(1:3), c(4:6), 7))
table1
#   id  |  id_g
#   1   |  1,2,3
#   2   |  4,5
#   3   |  6,7

table2 <- data.table(id_g = c(1:7), id_values = vector('list', 7L))
set(table2, j='id_values', value= list(c(1:5), c(1:2), 1, 2, c(3:5), 3, c(1:2)))
table2
#   id_g  |  id_values
#   1     |  1,2,3,4,5
#   2     |  1,2
#   3     |  1
#   4     |  2
#   5     |  3,4,5
#   6     |  3
#   7     |  1,2
``````

Expected result:

``````table_all
#   id   |  id_g
#   1    | <data.table>
#   2    | <data.table>
#   3    | <data.table>
``````

where for example data.table for id = 1 is a list of three vectors: c(1:5), c(1,2), 1.

What is the most efficient way to join tables by id_g and create nested tables for each id? Nested tables should be a list of separate vectors with possibly different length.

There is bound to be a better way to select the rows of table2 wich have `id_g %in% v`, but try this for a start:

``````table_all <- table1
table_all\$id_g <- lapply(table_all\$id_g, function(v){table2[id_g %in% v,id_values]})
table_all
#   id   id_g
#1:  1 <list>
#2:  2 <list>
#3:  3 <list>
table_all\$id_g[1]
#[[1]]
#[[1]][[1]]
#[1] 1 2 3 4 5
#
#[[1]][[2]]
#[1] 1 2
#
#[[1]][[3]]
#[1] 1
``````

Pretty much the same, with `id_g` having `data.table` elements instead of `list`:

``````table_all <- table1
table_all\$id_g <- lapply(table_all\$id_g, function(v){data.table(table2[id_g %in% v,id_values])})
table_all
#   id         id_g
#1:  1 <data.table>
#2:  2 <data.table>
#3:  3 <data.table>
table_all\$id_g[1]
#[[1]]
#          V1
#1: 1,2,3,4,5
#2:       1,2
#3:         1
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download