Conor Neilson Conor Neilson - 1 year ago 83
R Question

ggplot2 - stacking graphs with different response variables but same x variable

I have a dataframe made up of 3 continuous response variables and 2 categorical predictor variables. I have been modelling each response variable separately, but using the same predictor variables. I would like to make 3 barcharts with the same x axis but for each response variable. It would be nice to get the formatting of something like

since each graph then wouldn't need its own x-axis. I've attached some sample data, and some code to show one of the graphs I produced.

y1<-sample(1:150, 100, replace=T)
y2<-sample(1:150, 100, replace=T)
y3<-sample(1:150, 100, replace=T)
x1<-sample(x=c("Site1", "Site2"), size=100, replace=T, prob=rep(1/2,2))
x2<-sample(x=c("A", "B", "C", "D"), size=100, replace=T, prob=rep(1/4,4))


ggplot(df, aes(x=x2, y=y1, fill=x1))

y1sum<-summarySE(df, measurevar="y1", groupvars=c("x1", "x2"))

ggplot(y1sum, aes(x=x2, y=y1, fill=x1)) + geom_bar(position=position_dodge(),
stat="identity") + geom_errorbar(aes(ymin=y1-ci, ymax=y1+ci), width=.2,

So I'd like to get the above graph, but for each response variable and stacked on top of each other.

As an aside, I'd also appreciate some guidance on how to add some letters above each set of bars to show which are significantly different.

The summarySE function is based off the code from here

summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=FALSE,
conf.interval=.95, .drop=TRUE) {

# New version of length which can handle NA's: if na.rm==T, don't count them
length2 <- function (x, na.rm=FALSE) {
if (na.rm) sum(!
else length(x)

# This does the summary. For each group's data frame, return a vector with
# N, mean, and sd
datac <- ddply(data, groupvars, .drop=.drop,
.fun = function(xx, col) {
c(N = length2(xx[[col]], na.rm=na.rm),
mean = mean (xx[[col]], na.rm=na.rm),
sd = sd (xx[[col]], na.rm=na.rm)

# Rename the "mean" column
datac <- rename(datac, c("mean" = measurevar))

datac$se <- datac$sd / sqrt(datac$N) # Calculate standard error of the mean

# Confidence interval multiplier for standard error
# Calculate t-statistic for confidence interval:
# e.g., if conf.interval is .95, use .975 (above/below), and use df=N-1
ciMult <- qt(conf.interval/2 + .5, datac$N-1)
datac$ci <- datac$se * ciMult


Thanks in advance to anyone who can offer advice.

Answer Source

I have used dplyr instead of the summarySE function you used

test <- df %>% gather(., key="var", value="value", -x1, -x2) %>%
  group_by(x1,x2,var) %>% summarise(N=n(), 
                                    Mean = mean(value), 
                                    sd= sd(value),
                                    se = sd/sqrt(N),
                                    ci = qnorm(0.975)*se) %>% ungroup

The below code creates a single column of bar plots coloured by site, and faceted by variable.

test %>%  ggplot(., aes(x=x2, y=Mean, fill=x1)) +
  geom_bar(position=position_dodge(), stat="identity") + 
  geom_errorbar(aes(ymin=Mean-ci, ymax=Mean+ci), width=.2,position=position_dodge(.9)) +
  facet_wrap(~var, ncol = 1)

It may be worth considering using a box plot as they often convey more information about the datset than bar plots.

df %>% gather(., key="var", value="value", -x1, -x2) %>% 
  ggplot(., aes(x=x2, y=value, fill=x1)) +geom_boxplot() +
  facet_wrap(~var, ncol = 1)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download