I have a set of SNPs from different parts of the genome and their allele frequencies in various populations and metapopulations of interest. I want to plot the allele frequencies along the SNPs' genomic coordinates for all 22 autosomes.
Basically, I want to generate something like this Figure 1A from Sankararaman et al. (2014) (http://www.nature.com/nature/journal/v507/n7492/fig_tab/nature12961_F1.html) except the Y-axis would be frequency, all populations would be on the same graph (not separated), and I would have colored points instead of spikes.
My data is formatted as such (MAF = minor allele frequency, which is what I want to graph)
CHR SNP COORD CLST A1 A2 MAF MAC NCHROBS
1 rs16823303 2903159 Region G A 0.01887 4 212
For a simple plot of the coordinates versus frequency here's an example:
#Example data: MAF=runif(1000,min=0,max=1) COORD=runif(1000,min=0,max=100000) test.df=data.frame(COORD,MAF) #plot plot(test.df$COORD,test.df$MAF)
In the plot you won't need the example data, but will need to substitute your table name in for
If you need to beautify it with colors/labels etc. that can be done too:
plot(test.df$COORD,test.df$MAF, col="red", pch=18)
library(ggplot2) p=ggplot(test.df,aes(COORD,MAF)) p + geom_point()