Conor Neilson - 1 year ago 63

R Question

I have a dataframe made up of 3 continuous response variables and 2 categorical predictor variables. I have been modelling each response variable separately, but using the same predictor variables. I would like to make 3 barcharts with the same x axis but for each response variable. It would be nice to get the formatting of something like

`facet_wrap`

`y1<-sample(1:150, 100, replace=T)`

y2<-sample(1:150, 100, replace=T)

y3<-sample(1:150, 100, replace=T)

x1<-sample(x=c("Site1", "Site2"), size=100, replace=T, prob=rep(1/2,2))

x2<-sample(x=c("A", "B", "C", "D"), size=100, replace=T, prob=rep(1/4,4))

df<-data.frame(y1,y2,y3,x1,x2)

ggplot(df, aes(x=x2, y=y1, fill=x1))

y1sum<-summarySE(df, measurevar="y1", groupvars=c("x1", "x2"))

ggplot(y1sum, aes(x=x2, y=y1, fill=x1)) + geom_bar(position=position_dodge(),

stat="identity") + geom_errorbar(aes(ymin=y1-ci, ymax=y1+ci), width=.2,

position=position_dodge(.9))

So I'd like to get the above graph, but for each response variable and stacked on top of each other.

As an aside, I'd also appreciate some guidance on how to add some letters above each set of bars to show which are significantly different.

The summarySE function is based off the code from here

http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_(ggplot2)/

`summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=FALSE,`

conf.interval=.95, .drop=TRUE) {

library(plyr)

# New version of length which can handle NA's: if na.rm==T, don't count them

length2 <- function (x, na.rm=FALSE) {

if (na.rm) sum(!is.na(x))

else length(x)

}

# This does the summary. For each group's data frame, return a vector with

# N, mean, and sd

datac <- ddply(data, groupvars, .drop=.drop,

.fun = function(xx, col) {

c(N = length2(xx[[col]], na.rm=na.rm),

mean = mean (xx[[col]], na.rm=na.rm),

sd = sd (xx[[col]], na.rm=na.rm)

)

},

measurevar

)

# Rename the "mean" column

datac <- rename(datac, c("mean" = measurevar))

datac$se <- datac$sd / sqrt(datac$N) # Calculate standard error of the mean

# Confidence interval multiplier for standard error

# Calculate t-statistic for confidence interval:

# e.g., if conf.interval is .95, use .975 (above/below), and use df=N-1

ciMult <- qt(conf.interval/2 + .5, datac$N-1)

datac$ci <- datac$se * ciMult

return(datac)

}

Thanks in advance to anyone who can offer advice.

Answer Source

I have used dplyr instead of the summarySE function you used

```
library(dplyr)
test <- df %>% gather(., key="var", value="value", -x1, -x2) %>%
group_by(x1,x2,var) %>% summarise(N=n(),
Mean = mean(value),
sd= sd(value),
se = sd/sqrt(N),
ci = qnorm(0.975)*se) %>% ungroup
```

The below code creates a single column of bar plots coloured by site, and faceted by variable.

```
test %>% ggplot(., aes(x=x2, y=Mean, fill=x1)) +
geom_bar(position=position_dodge(), stat="identity") +
geom_errorbar(aes(ymin=Mean-ci, ymax=Mean+ci), width=.2,position=position_dodge(.9)) +
facet_wrap(~var, ncol = 1)
```

It may be worth considering using a box plot as they often convey more information about the datset than bar plots.

```
df %>% gather(., key="var", value="value", -x1, -x2) %>%
ggplot(., aes(x=x2, y=value, fill=x1)) +geom_boxplot() +
facet_wrap(~var, ncol = 1)
```