TBP - 3 months ago 42

R Question

I would like to process a list of lists. Specifically I want to extract the dataframe that is the third member of each list by a grouping variable (the first member of each list) and then use several functions like mean(), median(), sd(), length() etc on the data in that group. The output is then returned in a dataframe and would look something like:

`Grp mean sd ...`

a 5.26 ... ...

b 6.25 ... ...

#fake data

test<-list(

#member 1=grouping var, 2=identity, 3=dataframe

list("a", 54, data.frame(x=c(1,2) ,y=c(3,4))),

list("b", 55, data.frame(x=c(5,6) ,y=c(7,8))),

list("a", 56, data.frame(x=c(9 ,10),y=c(11,12))),

list("b", 57, data.frame(x=c(13,14),y=c(15,NA)))

)

#what I thought could work but kicks out a strange error

test2 <-ldply(test, .fun=unlist)

#note limited to just mean for now

tapply(test, factor(test$V1), FUN=function(x){mean(as.numeric(x[3:6]), na.rm=TRUE)}, simplify=TRUE)

So my questions are:

1. Why doesn't the above work?

2. This feels very clunky, is there a more efficient way to do this?

Answer

In base R you can do :

```
df_list <- tapply(test,
sapply(test, `[[`,1),
FUN=function(x) do.call(rbind,lapply(x, `[[`,3)))
t(sapply(df_list, function(x){
list("mean"=mean(unlist(x), na.rm = T),
"sd"=sd(unlist(x), na.rm = T),
"median"=median(unlist(x), na.rm = T))}))
mean sd median
a 6.5 4.440077 6.5
b 9.714286 4.151879 8
```