Floo0 Floo0 - 13 days ago 11
R Question

parsing german numbers within string-vector

Having a string as follows:

x <- c("31.12.2009EUR", "31.12.2009", "23.753,38", "0,00")


I would like to parse it as

c(NA, NA, 23753.38, 0.00)


I tried:

require(readr)
parse_number(x, locale=locale(decimal_mark = ",")) # This ignores the grouping_mark
#> 31122009.00 31122009.00 23753.38 0.00

parse_double(x, locale=locale(decimal_mark = ","))
#> NA NA NA 0


The only way i came up with:

out <- rep(NA, length(x))
ind <- grep("^[0-9]{1,3}(\\.[0-9]{3})*\\,[0-9]{2}", x)
out[ind] <- parse_number(x[ind],locale=locale(decimal_mark = ","))
out

Answer

This one-liner uses no packages and no complex regular expressions. It assumes the valid elements have a comma and the invalid ones not. This works with the sample input shown but if not in your real data just use a more complex regex in grepl based on whatever the criterion is.

as.numeric(ifelse(grepl(",", x), chartr(",", ".", gsub(".", "", x, fixed = TRUE)), NA))
## [1]       NA       NA 23753.38     0.00