TBP TBP - 21 days ago 10
R Question

Processing lists of lists by group

I would like to process a list of lists. Specifically I want to extract the dataframe that is the third member of each list by a grouping variable (the first member of each list) and then use several functions like mean(), median(), sd(), length() etc on the data in that group. The output is then returned in a dataframe and would look something like:

Grp mean sd ...
a 5.26 ... ...
b 6.25 ... ...

#fake data
test<-list(
#member 1=grouping var, 2=identity, 3=dataframe
list("a", 54, data.frame(x=c(1,2) ,y=c(3,4))),
list("b", 55, data.frame(x=c(5,6) ,y=c(7,8))),
list("a", 56, data.frame(x=c(9 ,10),y=c(11,12))),
list("b", 57, data.frame(x=c(13,14),y=c(15,NA)))
)

#what I thought could work but kicks out a strange error

test2 <-ldply(test, .fun=unlist)
#note limited to just mean for now
tapply(test, factor(test$V1), FUN=function(x){mean(as.numeric(x[3:6]), na.rm=TRUE)}, simplify=TRUE)


So my questions are:
1. Why doesn't the above work?
2. This feels very clunky, is there a more efficient way to do this?

Answer

In base R you can do :

df_list <- tapply(test, 
                  sapply(test, `[[`,1), 
                  FUN=function(x) do.call(rbind,lapply(x, `[[`,3)))
t(sapply(df_list, function(x){
  list("mean"=mean(unlist(x), na.rm = T),
       "sd"=sd(unlist(x), na.rm = T),
       "median"=median(unlist(x), na.rm = T))}))

  mean     sd       median
a 6.5      4.440077 6.5   
b 9.714286 4.151879 8