VasoGene VasoGene - 11 days ago 7
R Question

Arranging data in rows in R

I have the OMIM gene list (about 15,000 genes) with corresponding diseases that looks like this:

SLC6A8,CRTR,CCDS1 Cerebral creatine deficiency syndrome 1, 300352 (3)
BCAP31,BAP31,DXS1357E,DDCH Deafness, dystonia, and cerebral hypomyelination
ABCD1,ALD,AMN Adrenoleukodystrophy, 300100 (3), X-linked recessive
PLXNB3,PLXN6 NA


For some diseases, we have more than one gene name associated with a disease. I would like to organize this so I have only one genename per row and the associated disease:

SLC6A8 Cerebral creatine deficiency syndrome 1, 300352 (3)
CRTR Cerebral creatine deficiency syndrome 1, 300352 (3)
CCDS1 Cerebral creatine deficiency syndrome 1, 300352 (3)


Could this be done in R?

Answer

Not entirely sure what sort of data structure you have. Here's a quick solution that is hopefully helpful to what you're looking for:

splitFn <- function(x) expand.grid(df[x,"a"] %>% as.character %>% strsplit(., ",") %>% unlist, df[x, "b"])
ldply(1:nrow(df), splitFn)

       Var1                                                Var2
1    SLC6A8  Cerebral creatine deficiency syndrome 1, 300352(3)
2      CRTR  Cerebral creatine deficiency syndrome 1, 300352(3)
3     CCDS1  Cerebral creatine deficiency syndrome 1, 300352(3)
4    BCAP31    Deafness, dystonia, and cerebral hypomyelination
5     BAP31    Deafness, dystonia, and cerebral hypomyelination
6  DXS1357E    Deafness, dystonia, and cerebral hypomyelination
7      DDCH    Deafness, dystonia, and cerebral hypomyelination
8     ABCD1 Adrenoleukodystrophy, 300100(3), X-linked recessive
9       ALD Adrenoleukodystrophy, 300100(3), X-linked recessive
10      AMN Adrenoleukodystrophy, 300100(3), X-linked recessive
11   PLXNB3                                                <NA>
12    PLXN6                                                <NA>

The data.frame I'd used

df <- structure(list(a = structure(c(4L, 2L, 1L, 3L), .Label = c("ABCD1,ALD,AMN", 
"BCAP31,BAP31,DXS1357E,DDCH", "PLXNB3,PLXN6", "SLC6A8,CRTR,CCDS1"
), class = "factor"), b = structure(c(1L, 3L, 2L, NA), .Label = c(" Cerebral 
creatine deficiency syndrome 1, 300352(3)", 
"Adrenoleukodystrophy, 300100(3), X-linked recessive", "Deafness, dystonia, and cerebral hypomyelination"
), class = "factor")), .Names = c("a", "b"), row.names = c(NA, 
-4L), class = "data.frame")```