Seb Seb - 7 months ago 38
R Question

Converting internationally formatted strings to numeric

I have a file with internationally formatted numbers (i.e strings) including units of measurement. In this case the decimal place is indicated by "," and the 1e3 seperator is indicated as "." (i.e. German number formats).

a <- c('2.200.222 €',
' 180.109,3 €')


b <- c('28,42 m²',
'47,70 m²')

I'd like to convert these strings efficiently to
. I've tried to filter out numbers by codes like

str_extract(a, pattern='[0-9]+.[0-9]+.[0-9]+')
str_extract(b, pattern='[0-9]+,[0-9]+')

however, this does seem to be too prone to errors and I guess there must be a more standardized way. So here's my question: Is there a custom function, package or something else that is capable of such a problem?

Thank you very much!


Here is a function that uses gsub to deal with the sample data you posted:

x <- c('2.200.222   €', '  180.109,3 €', '28,42 m²', '47,70 m²')

strip <- function(x){
  z <- gsub("[^0-9,.]", "", x)
  z <- gsub("\\.", "", z)
  gsub(",", ".", z)

[1] 2200222.00  180109.30      28.42      47.70

It works like this:

  1. First strip out all trailing non-digits (and anything after these non-digits)
  2. Then strip out all periods.
  3. Finally, convert commas to periods.