Zia Ranks Zia Ranks - 1 month ago 7
R Question

means of vectors in dataframe by factor

I am trying to create a new dataframe that is a condensed version of a series of vectors.

while my data is built something like

mat <- matrix(1:18, 6)
g <- c("a", "a", "b", "b", "c", "c")
df <- cbind(g, mat)


I would like to achieve

result_df like

a 1.5 7.5 13.5
b 3.5 9.5 15.5
c 5.5 11.5 17.5


I am running into trouble when I try the for loop, is there a way lapply() or apply() can do this natively? is there a simpler solution?

Answer

Another option, that might be more flexible for future needs, is to use dplyr. This requires the data to be in a data.frame, but it sounds like that is what you have anyway.

df <- data.frame(g, mat)

df %>%
  group_by(g) %>%
  summarise_all(mean)

It groups by the g column, then takes a mean of all of the remaining columns. It returns:

      g    X1    X2    X3
1     a   1.5   7.5  13.5
2     b   3.5   9.5  15.5
3     c   5.5  11.5  17.5

Which I believe is your desired outcome. If combined with tidyr, it may also make it easier to use/access those means by putting them in a long format

df %>%
  gather(Measurement, Value, -g) %>%
  group_by(g, Measurement) %>%
  summarise(mean = mean(Value))

returning:

      g Measurement  mean
1     a          X1   1.5
2     a          X2   7.5
3     a          X3  13.5
4     b          X1   3.5
5     b          X2   9.5
6     b          X3  15.5
7     c          X1   5.5
8     c          X2  11.5
9     c          X3  17.5