Seb Seb - 2 months ago 6
R Question

Converting internationally formatted strings to numeric

I have a file with internationally formatted numbers (i.e strings) including units of measurement. In this case the decimal place is indicated by "," and the 1e3 seperator is indicated as "." (i.e. German number formats).

a <- c('2.200.222 €',
' 180.109,3 €')


or

b <- c('28,42 m²',
'47,70 m²')


I'd like to convert these strings efficiently to
numeric
. I've tried to filter out numbers by codes like

require(stringr)
str_extract(a, pattern='[0-9]+.[0-9]+.[0-9]+')
str_extract(b, pattern='[0-9]+,[0-9]+')


however, this does seem to be too prone to errors and I guess there must be a more standardized way. So here's my question: Is there a custom function, package or something else that is capable of such a problem?

Thank you very much!

Answer

Here is a function that uses gsub to deal with the sample data you posted:

x <- c('2.200.222   €', '  180.109,3 €', '28,42 m²', '47,70 m²')

strip <- function(x){
  z <- gsub("[^0-9,.]", "", x)
  z <- gsub("\\.", "", z)
  gsub(",", ".", z)
}

as.numeric(strip(x))
[1] 2200222.00  180109.30      28.42      47.70

It works like this:

  1. First strip out all trailing non-digits (and anything after these non-digits)
  2. Then strip out all periods.
  3. Finally, convert commas to periods.
Comments