Olga Anufrieva Olga Anufrieva - 2 months ago 7
R Question

Remove a part of string before underscore

I have a character vector of names that look like

A00_A09_Intestinal_infectious_diseases
A09_Diarrhoea_and_gastro_enteritis


I would like to remove the IDs on the beginning of string, so that it would look like

Intestinal_infectious_diseases
Diarrhoea_and_gastro_enteritis


I suppose it is possible to be done with
gsub
but due to my small experience, it didnt work out for me.
Thank you for any help.

Answer

We can try with sub. Match zero or more characters followed by a capital letter followed by one or more numbers and a underscore and replace it with "".

sub(".*[A-Z][0-9]+_", "", str1)
#[1] "Intestinal_infectious_diseases" "Diarrhoea_and_gastro_enteritis"

Or to be specific, we match the pattern of one or more instances of ({1,}) capital letter ([A-Z]) followed by one or more numbers ([0-9]+) followed by an underscore (_) and replace it with blank ("").

sub("([A-Z][0-9]+_){1,}", "", str1)

data

str1 <- c("A00_A09_Intestinal_infectious_diseases", "A09_Diarrhoea_and_gastro_enteritis")
Comments