R Question

replace and remove part of string in rownames

I want to remove a part of the rownames in my data frame. I want to remove everything that do not match the string that is defined in the grepl below and replace it with the string defined behind. Does anyone know?

df[grepl(".*lncRNA.*|.*snRNA.*|.*snoRNA.*|.*precursor_RNA.*", rownames(df))] <- c("lncRNA","snRNA","snoRNA","precursor_RNA")


[3212] "URS000075B261-precursor_RNA_CTTTCTATGCTCCTGTTCTGC"


[3208] "snoRNA"
[3209] "snRNA"
[3210] "snRNA"
[3211] "lncRNA"
[3212] "precursor_RNA"
[3213] "lncRNA"


We can use gsub to match one of more characters that are not a - ([^-]+) from the start (^) of the string followed by a - or (|) one or more characters that are not an underscore ([^_]+) until the end of the string ($) and replace it with blanks ("").

gsub("^[^-]+-|_[^_]+$", "", v1)
#[1] "snoRNA"        "snRNA"         "snRNA"         "lncRNA"       
#[5] "precursor_RNA" "lncRNA"  

If we are doing this on the rownames

gsub("^[^-]+-|_[^_]+$", "", rownames(df))