upendra upendra - 3 months ago 8
R Question

How to populate a data frame based on condition in R

I created a empty data frame something like this

id Alyr Crub Lala Brap Bole Spar Esal Aara Thas
1 XLOC_003940_TBH_1 NA NA NA NA NA NA NA NA NA


I wanted to see if
id
and column name match then it should replace "NA" with certain value. Here is an example:

ex1 <- "Alyr_XLOC_003940_TBH_1_Ortholog_Known_Gene_Sense"

sp <- sub("([A-Za-z]+)_(XLOC_\\d+_TBH_1)_([A-Za-z_]+)","\\1", ex1)
gene <- sub("([A-Za-z]+)_(XLOC_\\d+_TBH_1)_([A-Za-z_]+)","\\2", ex1)
fun <- sub("([A-Za-z]+)_(XLOC_\\d+_TBH_1)_([A-Za-z_]+)","\\3", ex1)


Based on the above example, i wanted to get something like this

id Alyr Crub Lala Brap Bole Spar Esal Aara Thas
1 XLOC_003940_TBH_1 Ortholog_Known_Gene_Sense NF NF NF NF NF NF NF NF


I am stuck here and can't figure how can i do this?

Answer

Use matrix subsetting:

df1$id <- gene
df1[cbind(1:nrow(df1), match(sp, names(df1)))] <- fun

Check this answer for more on subsetting a data frame by a two-column matrix.

##Example
nms <- scan(what="character", text="id Alyr Crub Lala Brap Bole Spar Esal Aara Thas")
df1 <- as.data.frame(matrix(NA, 3, 10))
names(df1) <- nms
df1
#  id Alyr Crub Lala Brap Bole Spar Esal Aara Thas
#1 NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
#2 NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
#3 NA   NA   NA   NA   NA   NA   NA   NA   NA   NA


ex1 <- c("Alyr_XLOC_003940_TBH_1_Ortholog_Gene",
         "Lala_XLOC_1234_TBH_1_Lalala_Gene",
         "Thas_XLOC_5678_TBH_1_Thasthas_Gene")

sp <- sub("([A-Za-z]+)_(XLOC_\\d+_TBH_1)_([A-Za-z_]+)","\\1", ex1)
gene <- sub("([A-Za-z]+)_(XLOC_\\d+_TBH_1)_([A-Za-z_]+)","\\2", ex1)
fun <- sub("([A-Za-z]+)_(XLOC_\\d+_TBH_1)_([A-Za-z_]+)","\\3", ex1)

df1$id <- gene
df1[cbind(1:nrow(df1), match(sp, names(df1)))] <- fun
df1
  #                  id          Alyr Crub        Lala Brap Bole Spar Esal Aara          Thas
  # 1 XLOC_003940_TBH_1 Ortholog_Gene   NA        <NA>   NA   NA   NA   NA   NA          <NA>
  # 2   XLOC_1234_TBH_1          <NA>   NA Lalala_Gene   NA   NA   NA   NA   NA          <NA>
  # 3   XLOC_5678_TBH_1          <NA>   NA        <NA>   NA   NA   NA   NA   NA Thasthas_Gene
Comments