JavierM88 JavierM88 - 2 months ago 14
R Question

Error when replacing a value in a data frame with NAs

Say I have this dataframe

z
:

x <- c("NS","NS",NA)
y <- c("yes","yes","b")
z <- as.data.frame(cbind(x,y), stringsAsFactors=FALSE)

> z
x y
1 NS yes
2 NS yes
3 <NA> b


I just want to change the values that contain the
"yes"
element to
"a"
. If I do this I get an error:

z[z$x=="NS","yes"]<-"a"
Error in `[<-.data.frame`(`*tmp*`, z$x == "NS", "yes", value = "a") :
missing values are not allowed in subscripted assignments of data frames


Because for some reason I am getting the dataframe with NA even though I only subset by
"NS"
. If I remove the
NA
, I get another error:

na.omit(z[z$x=="NS","a"])<-"no"
Error in na.omit(z[z$x == "NS", "a"]) <- "no" :
could not find function "na.omit<-"

Answer

The first problem is to sepcify the variable name correcty, that is with the name and not the value (probably just a typo in your question): "y" and not "yes".

Then another problem arises when you use == and it tries to think of what to do with the NA in the third row:

x=="NS"
[1] TRUE TRUE   NA

hmm, should it be kept or not ? It is neither TRUE nor FALSE... so it just gives an error as it cannot "decide".

While, using %in% (which is actually match(x, table, nomatch = 0)), we get:

x %in% "NS"
[1]  TRUE  TRUE FALSE

There you go, NA doesn't match the value "NS" so it returns 0, or, in logical, FALSE : we shouldn't keep it.

Thus, to get what you want:

z[z$x %in% "NS", "y"] <- "a"
z
#     x y
#1   NS a
#2   NS a
#3 <NA> b