Olga Anufrieva Olga Anufrieva - 1 year ago 62
R Question

Remove a part of string before underscore

I have a character vector of names that look like


I would like to remove the IDs on the beginning of string, so that it would look like


I suppose it is possible to be done with
but due to my small experience, it didnt work out for me.
Thank you for any help.

Answer Source

We can try with sub. Match zero or more characters followed by a capital letter followed by one or more numbers and a underscore and replace it with "".

sub(".*[A-Z][0-9]+_", "", str1)
#[1] "Intestinal_infectious_diseases" "Diarrhoea_and_gastro_enteritis"

Or to be specific, we match the pattern of one or more instances of ({1,}) capital letter ([A-Z]) followed by one or more numbers ([0-9]+) followed by an underscore (_) and replace it with blank ("").

sub("([A-Z][0-9]+_){1,}", "", str1)


str1 <- c("A00_A09_Intestinal_infectious_diseases", "A09_Diarrhoea_and_gastro_enteritis")
