user3463225 user3463225 - 4 months ago 20
R Question

R: merge data.tables to create nested data.table with list for each row

The following code creates two data tables, with the same column id_g which is presented in different ways.

table1 <- data.table(id=c(1:3), id_g = vector('list', 3L))
set(table1, j='id_g', value=list(c(1:3), c(4:6), 7))
table1
# id | id_g
# 1 | 1,2,3
# 2 | 4,5
# 3 | 6,7

table2 <- data.table(id_g = c(1:7), id_values = vector('list', 7L))
set(table2, j='id_values', value= list(c(1:5), c(1:2), 1, 2, c(3:5), 3, c(1:2)))
table2
# id_g | id_values
# 1 | 1,2,3,4,5
# 2 | 1,2
# 3 | 1
# 4 | 2
# 5 | 3,4,5
# 6 | 3
# 7 | 1,2


Expected result:

table_all
# id | id_g
# 1 | <data.table>
# 2 | <data.table>
# 3 | <data.table>


where for example data.table for id = 1 is a list of three vectors: c(1:5), c(1,2), 1.

What is the most efficient way to join tables by id_g and create nested tables for each id? Nested tables should be a list of separate vectors with possibly different length.

Answer

There is bound to be a better way to select the rows of table2 wich have id_g %in% v, but try this for a start:

table_all <- table1
table_all$id_g <- lapply(table_all$id_g, function(v){table2[id_g %in% v,id_values]})
table_all
#   id   id_g
#1:  1 <list>
#2:  2 <list>
#3:  3 <list>
table_all$id_g[1]
#[[1]]
#[[1]][[1]]
#[1] 1 2 3 4 5
#
#[[1]][[2]]
#[1] 1 2
#
#[[1]][[3]]
#[1] 1

Pretty much the same, with id_g having data.table elements instead of list:

table_all <- table1
table_all$id_g <- lapply(table_all$id_g, function(v){data.table(table2[id_g %in% v,id_values])})
table_all
#   id         id_g
#1:  1 <data.table>
#2:  2 <data.table>
#3:  3 <data.table>
table_all$id_g[1]
#[[1]]
#          V1
#1: 1,2,3,4,5
#2:       1,2
#3:         1