foehn foehn - 3 months ago 52
R Question

rapply to nested list of data frames in R

i have a nested list whose fundamental element is data frames, and i want to traverse this list recursively to do some computation of each data frame, finally to get a nested list of results in the same structure as the input. I know "rapply" is exactly for such kind of task, but i met a problem that, rapply actually goes even deeper than i want, i.e. it decomposes every data frame and applies to each column instead (because a data frame itself is a list in R).

One workaround i can think about is to convert each data frame to matrix, but it will force to uniform the data types, so i don't like it really. I want to know if there is any way to control the recursive depth of rapply. Any idea? Thanks.

Answer

1. wrap in proto

When creating your list structure try wrapping the data frames in proto objects:

library(proto)
L <- list(a = proto(DF = BOD), b = proto(DF = BOD))
rapply(L, f = function(.) colSums(.$DF), how = "replace")

giving:

$a
  Time demand 
    22     89 

$b
  Time demand 
    22     89 

Wrap the result of your function in a proto object too if you want to further rapply it;

f <- function(.) proto(result = colSums(.$DF))
out <- rapply(L, f = f, how = "replace")
str(out)

giving:

List of 2
 $ a:proto object 
 .. $ result: Named num [1:2] 22 89 
 ..  ..- attr(*, "names")= chr [1:2] "Time" "demand" 
 $ b:proto object 
 .. $ result: Named num [1:2] 22 89 
 ..  ..- attr(*, "names")= chr [1:2] "Time" "demand" 

2. write your own rapply alternative

recurse <- function (L, f) {
    if (inherits(L, "data.frame")) f(L)
    else lapply(L, recurse, f)
}

L <- list(a = BOD, b = BOD)
recurse(L, colSums)

This gives:

$a
  Time demand 
    22     89 

$b
  Time demand 
    22     89 

ADDED: second approach