David Kaufman David Kaufman - 2 months ago 7x
R Question

qdap package: bug in converting zero digits to "zero" words

Before (as a rookie) I go submitting this as an R package bug, let me run it by y'all. I think all of the following are good:

replace_number("123 0 boogie")
[1] "one hundred twenty three boogie"
replace_number("1;1 foo")
[1] "one;one foo"
replace_number("47 bar")
[1] "forty seven bar"

I think all of the following are bad because "zero" is missing from the output:

replace_number("1;0 foo")
[1] "one; foo"
replace_number("00 bar")
[1] "bar"
[1] "x"

Basically, I'd say that
is incapable of handling strings that contain the digit 0 (except for "0"). Is it a real bug?


If you dig into the guts of replace_number:

 unlist(lapply(lapply(gsub(",([0-9])", "\\1", text.var), function(x) {
        if (!is.na(x) & length(unlist(strsplit(x, "([0-9])", 
            perl = TRUE))) > 1) {
            num_sub(x, num.paste = num.paste)
        else {
    }), function(x) mgsub(0:9, ones, x)))

you can see that the problem occurs in qdap:::num_sub

qdap:::num_sub("101", num.paste = "combine") ## "onehundredone"
qdap:::num_sub("0", num.paste = "combine")   ## ""

Digging within that function, the issue occurs in numb2word, which has internal codes

ones <- c("", "one", "two", "three", "four", "five", "six", 
    "seven", "eight", "nine")
names(ones) <- 0:9

which convert zero values to blanks. If I were facing this problem myself I would fork the qdap repo, go to replace_number.R, and try to change this in a backward compatible way so that replace_number could take a logical argument blank_zeros=TRUE, which got passed down to numb2word and did the right thing, e.g.

ones <- c(if (blank_zeros) "" else "zero",
          "one", "two", "three", "four", "five", "six", 
    "seven", "eight", "nine")

In the meantime I have posted this on the qdap issues list.