E B E B - 2 months ago 9
R Question

in R how do i get mapply to ignore the NA on elements passed to the function

I created a simple function to determine the difference between 2 variables in a dataframe

detYearDisc <- function(x,y)
{
if (x < y)
return("L")
if (x > y)
return("G")
if (x == y)
return("N")

}


The dataframe df can contain NA on x or y or both . When I run the mapply function

df$DiscInd = mapply(detYearDisc, df$X,df$Y)


I get the following error:

Error in if (x < y) return("L") : missing value where TRUE/FALSE needed


Is this because I got NA on x or y value??

Answer

Yes, the reason is that either of them has NA value. See the followings:

mapply(detYearDisc, 1,2)
#[1] "L"
mapply(detYearDisc, 2,2)
#[1] "N"
mapply(detYearDisc, 2,1)
#[1] "G"
mapply(detYearDisc, 2,NA)
#Error in if (x < y) return("L") : missing value where TRUE/FALSE needed

To handle it, you can add the following as the first line in your function:

if (is.na(x) | is.na(y))
    return("Not a number!")

However, you can achieve the same with this simple ifelse in a vectorized manner:

ifelse(df$x>df$y, "G", ifelse(df$x<df$y, "L", "N"))

In case of NA, it will return NA. E.g. for:

df
   x y
1  1 5
2  3 0
3  5 1
4 NA 4

Will give you:

[1] "L" "G" "G" NA 

Alternatively, thanks to @alistaire for pointing out case_when from the dplyr package, you could also do:

f <- function(x,y){
case_when(
    (is.na(x) | is.na(y)) ~ "NA",
    x>y ~ "G",
    x<y ~ "L",
    TRUE ~ "N"
)}

So, you would get the same result by calling the function f(df$x, df$y).