leftisthominid leftisthominid - 3 months ago 12
R Question

Good way to graph allele frequency of different SNPs along chromosomes

I have a set of SNPs from different parts of the genome and their allele frequencies in various populations and metapopulations of interest. I want to plot the allele frequencies along the SNPs' genomic coordinates for all 22 autosomes.

Basically, I want to generate something like this Figure 1A from Sankararaman et al. (2014) (http://www.nature.com/nature/journal/v507/n7492/fig_tab/nature12961_F1.html) except the Y-axis would be frequency, all populations would be on the same graph (not separated), and I would have colored points instead of spikes.

My data is formatted as such (MAF = minor allele frequency, which is what I want to graph)

CHR SNP COORD CLST A1 A2 MAF MAC NCHROBS
1 rs16823303 2903159 Region G A 0.01887 4 212


(It goes through all the SNPs for on region, and then it does them for the next region, and so forth)

Any suggestions on how to do this in R? Thanks!

Answer

For a simple plot of the coordinates versus frequency here's an example:

#Example data:
MAF=runif(1000,min=0,max=1)
COORD=runif(1000,min=0,max=100000)
test.df=data.frame(COORD,MAF)

#plot
plot(test.df$COORD,test.df$MAF)

In the plot you won't need the example data, but will need to substitute your table name in for test.df.

If you need to beautify it with colors/labels etc. that can be done too:

plot(test.df$COORD,test.df$MAF, col="red", pch=18)

OR

library(ggplot2)
p=ggplot(test.df,aes(COORD,MAF))
p + geom_point()
Comments