drmariod - 1 year ago 96
R Question

# Calculate means of variables in melted dataframe, grouped by name1

I have a problem using melt and mean on a data frame. Maybe it is just a beginner problem.

I have a data frame with two conditions and 3 replicates

``````set.seed(1)
testdf <- data.frame(name1=letters[1:5],con1_1=floor(runif(5,20,35)),con1_2=floor(runif(5,20,35)),con1_3=floor(runif(5,20,35)),con2_1=floor(runif(5,20,35)),con2_2=floor(runif(5,20,35)),con2_3=floor(runif(5,20,35)))

print(testdf)
name1 con1_1 con1_2 con1_3 con2_1 con2_2 con2_3
1     a     23     33     23     27     34     25
2     b     25     34     22     30     23     20
3     c     28     29     30     34     29     25
4     d     33     29     25     25     21     33
5     e     23     20     31     31     24     25
``````

Now I melt the data frame to get something like this and I wand to calculate the mean per condition...

``````melt(testdf[1:7],id.vars='name1')
name1 variable value   mean_con1  mean_con2
1      a   con1_1    23    26.33333   28.66667
2      b   con1_1    25    27.00000   24.33333
3      c   con1_1    28    29.00000   29.33333
4      d   con1_1    33    29.00000   26.33333
5      e   con1_1    23    24.66667   26.66667
6      a   con1_2    33    26.33333   28.66667
7      b   con1_2    34    27.00000   24.33333
8      c   con1_2    29    29.00000   29.33333
9      d   con1_2    29    29.00000   26.33333
10     e   con1_2    20    24.66667   26.66667
11     a   con1_3    23    26.33333   28.66667
...
``````

Is there a way to calculate the mean with the melted data frame, or do I have to recreate the original data frame and calculate the mean like below?

``````testdf\$mean_con1 <- apply(testdf[2:4],1,mean)
testdf\$mean_con2 <- apply(testdf[5:7],1,mean)
``````

EDIT:

I just realised, I did a mistake in my question. Your answers are right, but what I am actually looking for would be the means in just one column. Alternating depending on the variable...
Does this make sense?

``````melt(testdf[1:7],id.vars='name1')
name1 variable value       mean
1      a   con1_1    23   26.33333
2      b   con1_1    25   27.00000
3      c   con1_1    28   29.00000
4      d   con1_1    33   29.00000
5      e   con1_1    23   24.66667
6      a   con1_2    33   26.33333
7      b   con1_2    34   27.00000
...
16     a   con2_1    27   28.66667
17     b   con2_1    30   24.33333
18     c   con2_1    34   29.33333
19     d   con2_1    25   26.33333
20     e   con2_1    31   26.66667
21     a   con2_2    34   28.66667
22     b   con2_2    23   24.33333
...
``````

Ok, maybe I explain what I am trying to do. I want to plot the data with ggplot, I want to make bars representing the mean and the values as geom_points as an overlay. I want to color the dots by replicates and group everything by the condition... Hope someone has an idea...

using `base` functions

`````` df1 <- reshape(testdf, idvar="name1", varying=2:7, direction="long",sep="_")
aggregate(df1[,3:4], list(name1=df1[,1]), FUN=mean)
#  name1     con1     con2
#1     a 26.33333 28.66667
#2     b 27.00000 24.33333
#3     c 29.00000 29.33333
#4     d 29.00000 26.33333
#5     e 24.66667 26.66667
``````

Or you can use `data.table` to calculate mean after it was reshaped

`````` library(data.table)
setDT(df1)[, lapply(.SD, mean),.SDcols=c("con1", "con2"), by=name1]
#   name1     con1     con2
#1:     a 26.33333 28.66667
#2:     b 27.00000 24.33333
#3:     c 29.00000 29.33333
#4:     d 29.00000 26.33333
#5:     e 24.66667 26.66667
``````

### Update

If you need it in the long format

`````` library(reshape2)
res <-  within( melt(testdf, id="name1"), {
variable<- gsub("\\_.*","",variable)
Mean<- ave(value, name1,variable)})

#  name1 variable value     Mean
#1     a     con1    23 26.33333
#2     b     con1    25 27.00000
#3     c     con1    28 29.00000
#4     d     con1    33 29.00000
#5     e     con1    23 24.66667
#6     a     con1    33 26.33333

library(ggplot2)
res\$repl <- rep(rep(1:3, each=5),2)
p <- ggplot(res, aes(name1, Mean))
p +
geom_bar(position="dodge", stat="identity", fill="LightBlue") +
geom_point(data=res, aes(name1, value, colour=repl))+
facet_wrap(~variable) +
theme_bw()
``````

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download