LLL LLL - 3 years ago 196
R Question

Removing some but not all digits from variable names in R

I'd like to remove digits from the variable names of a data frame like df1 to generate a data frame like df2. I'd like for digits to be removed only if there are at least two consecutive word characters before the digit, with the exception of the digit 4 which I'd like to keep at all times. Many thanks.

Current df:

df1 <- data.frame("ACO2_E1_E2"=c(1,1,1),"BCKDHB6_E1"=c(1,1,1) ,

Desired df:

df2 <- data.frame("ACO_E1_E2"=c(1,1,1),"BCKDHB_E1"=c(1,1,1) ,

My attempt: (I manage to remove/keep the correct digits but only indiscriminately, and can't figure out how to introduce the other criterion of at least two consecutive word characters before the digit.)

gsub('[0,1,2,3,5,6,7,8,9]+', '', names(df1))

Answer Source

Match two word characters followed by a non-4 digit and replace that with the word characters:

x <- gsub("([A-Z]{2})[012356789]", "\\1", names(df1))
## [1] "ACO_E1_E2" "BCKDHB_E1" "CDDD4_E3"  "HDFE"   

identical(x, names(df2))
## [1] TRUE
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download