Andrew Andrew - 2 months ago 14
R Question

converting output of R's "by" command to data frame

I'm trying to use R's

by
command to get column means for subsets of a data frame. For example, consider this data frame:

> z = data.frame(labels=c("a","a","b","c","c"),data=matrix(1:20,nrow=5))
> z
labels data.1 data.2 data.3 data.4
1 a 1 6 11 16
2 a 2 7 12 17
3 b 3 8 13 18
4 c 4 9 14 19
5 c 5 10 15 20


I can use R's
by
command to get the column means according to the labels column:

> by(z[,2:5],z$labels,colMeans)
z[, 1]: a
data.1 data.2 data.3 data.4
1.5 6.5 11.5 16.5
------------------------------------------------------------
z[, 1]: b
data.1 data.2 data.3 data.4
3 8 13 18
------------------------------------------------------------
z[, 1]: c
data.1 data.2 data.3 data.4
4.5 9.5 14.5 19.5


But how do I coerce the output back to a data frame?
as.data.frame
doesn't work...

> as.data.frame(by(z[,2:5],z$labels,colMeans))
Error in as.data.frame.default(by(z[, 2:5], z$labels, colMeans)) :
cannot coerce class '"by"' into a data.frame

Answer

You can use ddply from plyr package

library(plyr)
ddply(z, .(labels), numcolwise(mean))
  labels data.1 data.2 data.3 data.4
1      a    1.5    6.5   11.5   16.5
2      b    3.0    8.0   13.0   18.0
3      c    4.5    9.5   14.5   19.5

Or aggregate from stats

aggregate(z[,-1], by=list(z$labels), mean)
  Group.1 data.1 data.2 data.3 data.4
1       a    1.5    6.5   11.5   16.5
2       b    3.0    8.0   13.0   18.0
3       c    4.5    9.5   14.5   19.5

Or dcast from reshape2 package

library(reshape2)
dcast( melt(z), labels ~ variable, mean)

Using sapply :

 t(sapply(split(z[,-1], z$labels), colMeans))
  data.1 data.2 data.3 data.4
a    1.5    6.5   11.5   16.5
b    3.0    8.0   13.0   18.0
c    4.5    9.5   14.5   19.5