James James - 1 month ago 8
R Question

How can I convert gene symbol to Ensembl ID and uniprot_swissprot in R?

I have a list of genes with their P-value and fold change values as a matrix.

Symbols Entrez_IDs logFC AveExpr t P.Value adj.P.Val B
7987405 RASGRP1 10125 -9.924e-01 6.937 -5.467e+00 7.496e-07 0.01147 5.41279
8095728 EREG 2069 7.046e-01 5.467 5.302e+00 1.420e-06 0.01147 4.85944
7908397 RGS13 6003 6.332e-01 4.092 5.033e+00 3.949e-06 0.01728 3.97307
8176306 CSF2RA 1438 4.693e-01 5.085 5.012e+00 4.277e-06 0.01728 3.90397
8115355 GLRA1 2741 -1.548e+00 6.759 -4.928e+00 5.861e-06 0.01894 3.63094
7963826 PPP1R1A 5502 -9.774e-01 9.411 -4.710e+00 1.315e-05 0.03136 2.93060
7996022 CCL22 6367 6.668e-01 5.927 4.701e+00 1.358e-05 0.03136 2.90275
8139087 SFRP4 6424 1.520e+00 4.797 4.453e+00 3.340e-05 0.05467 2.12401
7929344 FFAR4 338557 -8.247e-01 6.682 -4.409e+00 3.908e-05 0.05467 1.98812
8119338 GLP1R 2740 -8.666e-01 8.111 -4.399e+00 4.052e-05 0.05467 1.95698
8100977 CXCL5 6374 6.301e-01 7.856 4.337e+00 5.047e-05 0.05467 1.76699
8104901 IL7R 3575 9.732e-01 4.962 4.331e+00 5.158e-05 0.05467 1.74821
8104570 FAM105A 54491 -9.411e-01 8.692 -4.330e+00 5.164e-05 0.05467 1.74718
8126244 LRFN2 57497 -7.189e-01 6.223 -4.317e+00 5.409e-05 0.05467 1.70720
7983630 FGF7 2252 1.032e+00 5.146 4.303e+00 5.685e-05 0.05467 1.66416
7919326 ACP6 51205 -4.909e-01 7.686 -4.302e+00 5.714e-05 0.05467 1.65977
7975268 ARG2 384 -9.104e-01 7.787 -4.273e+00 6.315e-05 0.05467 1.57340
7972021 TBC1D4 9882 -4.516e-01 7.663 -4.257e+00 6.684e-05 0.05467 1.52441
7938951 ANO5 203859 -6.176e-01 7.468 -4.230e+00 7.358e-05 0.05467 1.44148
7948881 WDR74 54663 4.599e-01 8.874 4.223e+00 7.532e-05 0.05467 1.42124
8120362 BEND6 221336 -5.006e-01 5.247 -4.220e+00 7.594e-05 0.05467 1.41416
8071953 SGSM1 129049 -4.729e-01 6.618 -4.216e+00 7.716e-05 0.05467 1.40042
8081548 NECTIN3 25945 -5.347e-01 8.841 -4.200e+00 8.144e-05 0.05467 1.35383
8154135 SLC1A1 6505 7.325e-01 8.062 4.183e+00 8.656e-05 0.05467 1.30118


I want to convert them to Ensembl gene ID and uniprot_swissprot. I tried to use the following code but I got error each time:

library(biomaRt)
mart <- useMart("ensembl", dataset="hsapiens_gene_ensembl")
attributes=c('ensembl_gene_id','ensembl_transcript_id','hgnc_symbol', 'uniprot_swissprot')
genes <- rma_final$genes
rma_final<-rma_final[,-10]
G_list<- getBM(attributes=attributes, filters="hugene10stv1",values=genes, mart=mart, uniqueRows=T)


Error in getBM(attributes = attributes, filters = "hugene10stv1", values = genes, :
Values argument contains no data.


I tried to use this commond

G_list<- getBM(attributes=attributes, filters="hugene10stv1",
values=rma_final$Symbols , mart=mart, uniqueRows=T)


but I got error as well

Error in getBM(attributes = attributes, filters = "hugene10stv1", values = rma_final$Entrez_IDs, : Invalid filters(s): hugene10stv1

Any help will be highly appreciated

Answer

Use "hgnc_symbol" as filter for the gene symbols:

genes <- c("RASGRP1","EREG")
G_list<- getBM(attributes=attributes, filters="hgnc_symbol",values=genes,
    mart=mart, uniqueRows=T)

   ensembl_gene_id ensembl_transcript_id hgnc_symbol uniprot_swissprot
1  ENSG00000172575       ENST00000310803     RASGRP1            O95267
2  ENSG00000172575       ENST00000558432     RASGRP1                  
3  ENSG00000172575       ENST00000561180     RASGRP1  
...