Andy.Jian Andy.Jian - 1 month ago 9
R Question

How to manipulate and merge data.frame in the nested list more efficiently?

I got two list of data.frames as an output of custom function, now I intend to split each data.frame in the list and I could get nested list accordingly. However, I want to manipulate this nested list to make group and merge. working with nested list is bit of tricky, and I couldn't manipulate them as I expected. Does anyone knows any useful trick to accomplish this task more easily and efficiently? How can I get my desired output? Thanks in advance

mini example:

myList_keep <- list(
hola.keep= data.frame( from=seq(1, by=4, len=15), to=seq(3, by=4, len=15), value=sample(30, 15)),
boo.keep = data.frame( from=seq(3, by=7, len=20), to=seq(6, by=7, len=20), value=sample(30, 20)),
meh.keep = data.frame( from=seq(4, by=8, len=25), to=seq(7, by=8, len=25), value=sample(30, 25))
)

myList_drop <- list(
hola.drop= data.frame( from=seq(11, by=7, len=10), to=seq(23, by=7, len=10), value=sample(15, 10)),
boo.drop = data.frame( from=seq(18, by=5, len=12), to=seq(26, by=5, len=12), value=sample(18, 12)),
meh.drop = data.frame( from=seq(24, by=8, len=15), to=seq(37, by=8, len=15), value=sample(30, 15))
)


I tried to split each data.frame as below:

splt_keep <- lapply(myList_keep, function(ele_) {
res <- split(ele_, ifelse(ele_$value >=10, "above", "below"))
})

splt_drop <- lapply(myList_keep, function(ele_) {
res <- split(ele_, ifelse(ele_$value >=10, "above", "below"))
})


I intend to manipulate nested list in this way:

for example, if I can manipulate splt_keep, splt_drop efficiently, then I could get this skeleton of the nested list:

$hola.above
$hola.keep$above
$hola.drop$above

$hola.below
$hola.keep$below
$hola.drop$below


then, after I get this format, I intend to merge them accordingly, so final output format would be:

$hola
$hola.above
$hola.below

$boo
$boo.above
$boo.below

$meh
$meh.above
$meh.below


How can I get my desired output easily ? how to manipulate nested list more comfortable way? Can anyone point me how to make this happens?

Answer

list are very inefficient structure to split/bind operatiosn sepacially for well structured data. Here an option using data.table:

##  I transform lists to a unique data.table
##  note that setting idcol=TRUE will create 
## a new id column to distinguish the origin of each list
library(data.table)
keep_dt <- rbindlist(myList_keep,idcol=TRUE)
drop_dt <- rbindlist(myList_drop,idcol=TRUE)
DT <- rbind(keep_dt,drop_dt)
## Then I create the new group factor
DT[,gr := ifelse(value>10,"above","below"),.id]
## then to get the "hola" , I just filter the whole tabale 
## and I split by the other factor to get the expected output
split(DT[grepl("hola",.id)],DT$gr)

update

To get the expected output:

DT[,.id:= gsub("[.](keep|drop)","",.id)]
by(DT,DT$.id,FUN = function(x)split(x,x$gr))