I'd like to remove digits from the variable names of a data frame like df1 to generate a data frame like df2. I'd like for digits to be removed only if there are at least two consecutive word characters before the digit, with the exception of the digit 4 which I'd like to keep at all times. Many thanks.
Current df:
df1 <- data.frame("ACO2_E1_E2"=c(1,1,1),"BCKDHB6_E1"=c(1,1,1) ,
"CDDD4_E3"=c(1,1,1),"HDFE1"=c(1,1,1))
df2 <- data.frame("ACO_E1_E2"=c(1,1,1),"BCKDHB_E1"=c(1,1,1) ,
"CDDD4_E3"=c(1,1,1),"HDFE"=c(1,1,1))
gsub('[0,1,2,3,5,6,7,8,9]+', '', names(df1))
Match two word characters followed by a non-4 digit and replace that with the word characters:
x <- gsub("([A-Z]{2})[012356789]", "\\1", names(df1))
x
## [1] "ACO_E1_E2" "BCKDHB_E1" "CDDD4_E3" "HDFE"
identical(x, names(df2))
## [1] TRUE