MAPK MAPK - 4 months ago 16
R Question

How to delete first and last items before the matching pattern or delimiter in R

I have this vector called

. I want to delete everything before first delimiter
and everything after the last delimiter
(including the delimeter). How do I do this in R to get the

myvec <- c("contamination_LPH-001-10_3.txt", "contamination_LPH-001-10_AK1_0.txt",
"contamination_LPH-001-10_AK2_1.txt", "contamination_LPH-001-10_PD_2.txt",


LPH-001-10, LPH-001-10_AK1,LPH-001-10_AK2,LPH-001-10_PD,LPH-001-10_SCC


We can use gsub for this

gsub("^[^_]*_|_[^_]*$", "", myvec)
#[1] "LPH-001-10"     "LPH-001-10_AK1" "LPH-001-10_AK2" 
#[4] "LPH-001-10_PD"  "LPH-001-10_SCC"

From the start (^) of the string, we are matching zero or more characters that are not a _ ([^_]*) followed by a _ or (|) match a _ followed by zero or more charachters that are not a _ ([^_]*) till the end ($) of the string and replace it with "".

Or we can also use capture groups ((...)) and replace with the backreference for the capture groups.

sub("^[^_]*_(.*)_[^_]*$", "\\1", myvec)
#[1] "LPH-001-10"     "LPH-001-10_AK1" "LPH-001-10_AK2" 
#[4] "LPH-001-10_PD"  "LPH-001-10_SCC"