Yatrosin Yatrosin - 1 year ago 105
R Question

How to show chromosome position and SNP label at the same time in x axis

I'm trying to plot a certain number of SNPs and have in the x-axis both, the chromosomal position and their labels.

I have imported the SNP information from a bed file into a GRanges object.

My bed file looks like this:

chr17 78191000 78191000 rsAAA 1 +
chr17 78191900 78191900 rsBBB 1 +
chr17 78194002 78194002 rsCCC 1 +
chr17 78197170 78197170 rsDDD 1 +

The function I used to convert the bed file into a GRanges object is the one from this website: http://davetang.org/muse/2015/02/04/bed-granges/

bed_to_granges <- function(file){
df <- read.table(file,

if(length(df) > 6){
df <- df[,-c(7:length(df))]

stop("File has less than 3 columns")

header <- c('chr','start','end','id','score','strand')
names(df) <- header[1:length(names(df))]

if('strand' %in% colnames(df)){
df$strand <- gsub(pattern="[^+-]+", replacement = '*', x = df$strand)


gr <- with(df, GRanges(chr, IRanges(start, end)))
} else if (length(df)==4){
gr <- with(df, GRanges(chr, IRanges(start, end), id=id))
} else if (length(df)==5){
gr <- with(df, GRanges(chr, IRanges(start, end), id=id, score=as.character(score)))
} else if (length(df)==6){
gr <- with(df, GRanges(chr, IRanges(start, end), id=id, score=as.character(score), strand=strand))

The code to import the bed file and reformat it according to the human hg19 build is the following one:

data(hg19Ideogram, package = "biovizBase")

## Import bed file as GRanges file
SNP <- bed_to_granges("SNP_position.bed")
seqlengths(SNP) <- seqlengths(hg19Ideogram)[names(seqlengths(SNP))]
SNP_dn <- keepSeqlevels(SNP, paste0("chr", c(1:22, "X", "Y")))

I have tried to plot the SNPs in the following ways:

SNP_location <- autoplot(SNP_dn) +
theme(text = element_text(size=8),
axis.text.x = element_text(angle=45, hjust=1)) +
theme(legend.position="none") +
xlim(78190000,78200000) +
fixed(SNP_location) <- TRUE

This code returns a plot with the chromosomal positions in the x-axis and the SNPs in their correct location.

SNP_IDs <- autoplot(SNP_dn) +
scale_x_continuous(name = "\nSNP IDs",
breaks = as.vector(start(SNP_dn)),
labels = as.factor (SNP_dn$id)) +
theme(text = element_text(size=8),
axis.text.x = element_text(angle=45, hjust=1)) +
theme(legend.position="none") +
fixed(SNP_IDs) <- TRUE

This code returns a re-scaled x-axis where the x-axis ticks correspond to the positions and labels of the SNPs themselves, but I loose the chromosomal references.

I would like to get a figure like the first one with the x-axis scaled according to chromosomal positions with a second line located anywhere containing the SNP names in the same figure.

I want to combine this figure afterwards with other plots showing other features from the same region with the ggbio tracks function and in order to do so they need to have the same chromosomal limits.

Is there an easy way to label SNPs keeping the original x-axis in chromosomal scale?

Thank you very much,


R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] grid stats4 parallel stats graphics grDevices utils datasets methods
[10] base

other attached packages:
[1] Homo.sapiens_1.3.1 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[3] org.Hs.eg.db_3.3.0 GO.db_3.3.0
[5] OrganismDbi_1.14.1 GenomicFeatures_1.24.5
[7] AnnotationDbi_1.34.4 Biobase_2.32.0
[9] GenomicRanges_1.24.2 GenomeInfoDb_1.8.3
[11] IRanges_2.6.1 S4Vectors_0.10.3
[13] biovizBase_1.20.0 ggbio_1.20.2
[15] ggplot2_2.1.0 BiocGenerics_0.18.0

loaded via a namespace (and not attached):
[1] Rcpp_0.12.6 lattice_0.20-33 Rsamtools_1.24.0
[4] Biostrings_2.40.2 digest_0.6.10 mime_0.5
[7] R6_2.1.3 plyr_1.8.4 chron_2.3-47
[10] acepack_1.3-3.3 RSQLite_1.0.0 httr_1.2.1
[13] BiocInstaller_1.22.3 zlibbioc_1.18.0 data.table_1.9.6
[16] rpart_4.1-10 Matrix_1.2-6 labeling_0.3
[19] splines_3.3.1 BiocParallel_1.6.6 AnnotationHub_2.4.2
[22] stringr_1.1.0 foreign_0.8-66 RCurl_1.95-4.8
[25] biomaRt_2.28.0 munsell_0.4.3 shiny_0.13.2
[28] httpuv_1.3.3 rtracklayer_1.32.2 htmltools_0.3.5
[31] nnet_7.3-12 SummarizedExperiment_1.2.3 gridExtra_2.2.1
[34] interactiveDisplayBase_1.10.3 Hmisc_3.17-4 XML_3.98-1.4
[37] reshape_0.8.5 GenomicAlignments_1.8.4 bitops_1.0-6
[40] RBGL_1.48.1 xtable_1.8-2 GGally_1.2.0
[43] gtable_0.2.0 DBI_0.5 magrittr_1.5
[46] scales_0.4.0 graph_1.50.0 stringi_1.1.1
[49] XVector_0.12.1 reshape2_1.4.1 latticeExtra_0.6-28
[52] Formula_1.2-1 RColorBrewer_1.1-2 ensembldb_1.4.7
[55] tools_3.3.1 dichromat_2.0-0 BSgenome_1.40.1
[58] survival_2.39-5 colorspace_1.2-6 cluster_2.0.4
[61] VariantAnnotation_1.18.7

Answer Source

I think I have found the parameter I was looking for: it is all about working with the geom_text() function. You can generate a int vector with the positions of the SNPs and a chr vector for the SNP names. Afterwards adding + geom_text(x = int_vector, y = rep(1.3,4), label = chr_vector, angle = 45, hjust = -0.4, vjust = 0.2, size = 3) would make it. There might be easier ways of doing it and I would appreciate it, if you share them.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download