Seb - 4 months ago 11

R Question

I have a file with internationally formatted numbers (i.e strings) including units of measurement. In this case the decimal place is indicated by "," and the 1e3 seperator is indicated as "." (i.e. German number formats).

`a <- c('2.200.222 €',`

' 180.109,3 €')

or

`b <- c('28,42 m²',`

'47,70 m²')

I'd like to convert these strings efficiently to

`numeric`

`require(stringr)`

str_extract(a, pattern='[0-9]+.[0-9]+.[0-9]+')

str_extract(b, pattern='[0-9]+,[0-9]+')

however, this does seem to be too prone to errors and I guess there must be a more standardized way. So here's my question: Is there a custom function, package or something else that is capable of such a problem?

Thank you very much!

Answer

Here is a function that uses `gsub`

to deal with the sample data you posted:

```
x <- c('2.200.222 €', ' 180.109,3 €', '28,42 m²', '47,70 m²')
strip <- function(x){
z <- gsub("[^0-9,.]", "", x)
z <- gsub("\\.", "", z)
gsub(",", ".", z)
}
as.numeric(strip(x))
[1] 2200222.00 180109.30 28.42 47.70
```

It works like this:

- First strip out all trailing non-digits (and anything after these non-digits)
- Then strip out all periods.
- Finally, convert commas to periods.