aminards - 1 year ago 156

R Question

I am creating barplots with standard deviation bars using

`ggplot2`

`SampleName Target.ID Maj.Allele.Freq SD AVG.MAF`

W15-P2-1 rs1005533 99.74811083 24.98883743 93.70753223

W15-P2-2 rs1005533 100 24.98883743 93.70753223

W15-P2-3 rs1005533 100 24.98883743 93.70753223

W15-P2-4 rs1005533 100 24.98883743 93.70753223

W15-P2-1 rs1005533 99.94819995 24.98883743 93.70753223

W15-P2-2 rs1005533 100 24.98883743 93.70753223

W15-P2-3 rs1005533 100 24.98883743 93.70753223

W15-P2-4 rs1005533 100 24.98883743 93.70753223

W21-P2-1 rs1005533 100 24.98883743 93.70753223

W21-P2-2 rs1005533 100 24.98883743 93.70753223

W21-P2-3 rs1005533 99.90044798 24.98883743 93.70753223

W21-P2-4 rs1005533 99.72375691 24.98883743 93.70753223

W21-P2-1 rs1005533 100 24.98883743 93.70753223

W21-P2-2 rs1005533 100 24.98883743 93.70753223

W21-P2-3 rs1005533 100 24.98883743 93.70753223

W21-P2-4 rs1005533 0 24.98883743 93.70753223

W15-P2-1 rs10092491 52.40641711 1.340954343 51.8604281

W15-P2-2 rs10092491 53.69923603 1.340954343 51.8604281

W15-P2-3 rs10092491 52.56689284 1.340954343 51.8604281

W15-P2-4 rs10092491 50.11764706 1.340954343 51.8604281

W15-P2-1 rs10092491 50.30094583 1.340954343 51.8604281

W15-P2-2 rs10092491 50.96277279 1.340954343 51.8604281

W15-P2-3 rs10092491 50.94102886 1.340954343 51.8604281

W15-P2-4 rs10092491 51.2849162 1.340954343 51.8604281

W21-P2-1 rs10092491 53.56976202 1.340954343 51.8604281

W21-P2-2 rs10092491 50.27861123 1.340954343 51.8604281

W21-P2-3 rs10092491 52.8358209 1.340954343 51.8604281

W21-P2-4 rs10092491 51.42585551 1.340954343 51.8604281

W21-P2-1 rs10092491 52.77890467 1.340954343 51.8604281

W21-P2-2 rs10092491 52.89017341 1.340954343 51.8604281

W21-P2-3 rs10092491 53.70786517 1.340954343 51.8604281

W21-P2-4 rs10092491 50 1.340954343 51.8604281

Because the average values in the last column (

`AVG.MAF`

Here is the code to create the above plot:

`pe1 = ggplot(half1, aes(x=Target.ID, y=AVG.MAF))+`

geom_bar(stat = "identity", position = "dodge", colour = "black",

width = 0.5, fill = "yellowgreen")+xlab("")+

ylab("Average Major Allele Frequency")+

labs(title="Allele Balance AmpliSeq Identity Sample P2")+

geom_errorbar(aes(ymin = AVG.MAF-SD, ymax = AVG.MAF+SD),

width = 0.4, position = position_dodge(0.9),

size = 0.6)+

theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = .5))

I tried truncating the plot using

`coord_cartesian`

Here is the code to create the plot with the standard deviation bars cut off:

`pe1 = ggplot(half1, aes(x=Target.ID, y=AVG.MAF))+geom_bar(stat = "identity", position = "dodge", colour = "black", width = 0.5, fill = "yellowgreen")+xlab("")+ylab("Average Major Allele Frequency")+labs(title="Allele Balance AmpliSeq Identity Sample P2")+geom_errorbar(aes(ymin = AVG.MAF-SD, ymax = AVG.MAF+SD), width = 0.4, position = position_dodge(0.9), size = 0.6)+theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = .5))+coord_cartesian(ylim=c(0,100))`

It seems like there has to be a way to restrict the standard deviation bars to my intended ymax of 100 and still keep the top horizontal bar visible in the plot. Does any one know how to do this?

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

In addition to the issues people have raised in the comments, here are a couple of other considerations:

You don't need to add a column that repeats the mean for every row of your data. Instead, you can calculate and plot the mean within ggplot, using the actual data values in

`Maj.Allele.Freq`

. (In fact, by using a column for the y-value that repeats the mean value over and over for each`Target.ID`

, you're actually plotting multiple copies of the mean bar, one on top of the other.)You can also summarize the data (i.e., calculate the mean and standard deviations) outside of ggplot and then use the summarized data frame for plotting. That's sometimes necessary in more complex situations, but you can do it all within ggplot here.

It seems to me points would work better than bars here.

The code below provides both the point and bar versions and also shows how to add either the standard deviation of the data or 95% confidence interval of the mean of the data. The blue lines represent the standard deviations, while the red lines represent the 95% confidence interval.

I've provided bootstrapped confidence intervals. To provide classical normal confidence intervals, switch from `mean_cl_boot`

to `mean_cl_normal`

.

If you want the y-axis to go down to zero, add `coord_cartesian(ylim=c(0,150))`

or whatever maximum y-value you wish (as the comments discuss, to avoid a misleading graph, it should be above the top of the error bar, regardless of whether the bar represents the SD or CI).

```
ggplot(half1, aes(x=Target.ID, y=Maj.Allele.Freq)) +
stat_summary(fun.data=mean_sdl, geom="errorbar", width=0.1, colour="blue") +
stat_summary(fun.data=mean_sdl, geom="point", colour="blue", size=3) +
stat_summary(fun.data = mean_cl_boot, colour="red", geom="errorbar", width=0.1) +
stat_summary(fun.data = mean_cl_boot, colour="red", geom="point") +
labs(x="", y="Average Major Allele Frequency",
title="Allele Balance AmpliSeq\nIdentity Sample P2") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = .5))
```

```
ggplot(half1, aes(x=Target.ID, y=Maj.Allele.Freq)) +
stat_summary(fun.y=mean, geom="bar", fill="yellowgreen", colour="black") +
stat_summary(fun.data=mean_sdl, geom="errorbar", width=0.1, size=1, colour="blue") +
stat_summary(fun.data = mean_cl_boot, colour="red", geom="errorbar", width=0.1, size=0.7) +
labs(x="", y="Average Major Allele Frequency",
title="Allele Balance AmpliSeq\nIdentity Sample P2") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = .5))
```

You could also put both the SD and 95% CI on the same plot:

```
pnp = position_nudge(x=0.1)
pnm = position_nudge(x=-0.1)
ggplot(half1, aes(x=Target.ID, y=Maj.Allele.Freq)) +
stat_summary(fun.data=mean_sdl, geom="errorbar", width=0.1, position=pnp, aes(colour="SD")) +
stat_summary(fun.data=mean_sdl, geom="point", position=pnp, aes(colour="SD")) +
stat_summary(fun.data = mean_cl_boot, geom="errorbar", width=0.1,
position=pnm, aes(colour="95% CI")) +
stat_summary(fun.data = mean_cl_boot, geom="point", position=pnm, aes(colour="95% CI")) +
labs(x="", y="Average Major Allele Frequency", colour="",
title="Allele Balance AmpliSeq\nIdentity Sample P2") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = .5))
```

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**