Cyrus Mohammadian Cyrus Mohammadian - 3 months ago 17
R Question

Remove everything except period and numbers from string regex in R

I know there are many questions on stack overflow regarding regex but I cannot accomplish this one easy task with the available help I've seen. Here's my data:

a<-c("Los Angeles, CA","New York, NY", "San Jose, CA")
b<-c("c(34.0522, 118.2437)","c(40.7128, 74.0059)","c(37.3382, 121.8863)")

df<-data.frame(a,b)
df
a b
1 Los Angeles, CA c(34.0522, 118.2437)
2 New York, NY c(40.7128, 74.0059)
3 San Jose, CA c(37.3382, 121.8863)


I would like to remove the everything but the numbers and the period (i.e. remove "c", ")" and "(". This is what I've tried thus far:

str_replace(df$b,"[^0-9.]","" )
[1] "(34.0522, 118.2437)" "(40.7128, 74.0059)" "(37.3382, 121.8863)"

str_replace(df$b,"[^\\d\\)]+","" )
[1] "34.0522, 118.2437)" "40.7128, 74.0059)" "37.3382, 121.8863)"


Not sure what's left to try. I would like to end up with the following:

[1] "34.0522, 118.2437" "40.7128, 74.0059" "37.3382, 121.8863"


Thanks.

Answer

Try this

gsub("[\\c|\\(|\\)]", "",df$b)
#[1] "34.0522, 118.2437" "40.7128, 74.0059"  "37.3382, 121.8863"