John John - 1 month ago 7
R Question

Regex to replace at least 5 digits in an character string

I have a column in a data frame with addresses that are a composite of unit/house number, street name, locality, postcode, and phone number.

the postcode is a four digit number.

Here is an example:

"26A JULIA STREET ANYTOWN 8523 71245632"


I want to strip the phone numbers but keep the postcodes and other numbers to return:

"26A JULIA STREET ANYTOWN 8523"


I have tried the following:

str_replace(string=field_name$ADDRESS, pattern="\\d{5,}", replacement="")


It does not remove the phone numbers. Can anyone point out where I am going wrong.

Answer

I personally like the extra detail of the stringi package (and stringr just wraps it anyway):

library(stringi)
library(magrittr)

field_name <- data.frame(ADDRESS="26A JULIA STREET ANYTOWN 8523 71245632", stringsAsFactors=FALSE)

stri_replace_last_regex(field_name$ADDRESS, "[[:digit:]]{5,}", "") %>% 
  stri_trim()
## [1] "26A JULIA STREET ANYTOWN 8523"