BIN BIN - 3 months ago 10
R Question

Merge and change NA separately in R

My goal is merging 2 data sets together but I need to keep track "NA", I need to "NA" when I merge df1 and df2, and "NA" of df1 assign something like 9999. The problem is my data have so many variables, it includes day, numeric, character..., so when I try to set "NA" of df1 by

df1[is.na(df1)] <- 9999
, it works only numeric, there is any way I can approach my solution to have different NA seperately.

df1 <- data.frame(ID= c(1:10),
Value=c(3,NA,7,2:8),
Group = c("A",NA,"C","D",NA,"B",NA,"C","D",NA))

df2 <- data.frame(ID = c(5:14),Count =c(1:9,NA),
School = c("A",NA,"C","D",NA,"B","NA","C","D",NA))

df1[is.na(df1)] <- 9999

data <- merge(df1,df2,all = TRUE,by= "ID")

ID Value Group Count School
1 1 3 A NA <NA>
2 2 9999 <NA>* NA <NA>
3 3 7 C NA <NA>
4 4 2 D NA <NA>
5 5 3 <NA>* 1 A
6 6 4 B 2 <NA>
7 7 5 <NA>* 3 C
8 8 6 C 4 D
9 9 7 D 5 <NA>
10 10 8 <NA>* 6 B
11 11 NA <NA> 7 NA
12 12 NA <NA> 8 C
13 13 NA <NA> 9 D
14 14 NA <NA> NA <NA>


The ones
*
suppose 9999

Answer

I'd like to contribute a bit more to this question. If you have, say, 100 columns in various classes and try to replace all NAs, you could try the following. The idea is that you convert all columns to character and replace all NAs with 9999. Then, you want to convert the classes of the columns back to the original classes. Finally, you merge df1 and df2.

library(dplyr)

# Save original classes.
original <- unlist(lapply(df1, class))

# Convert all columns to character and replace NAs with 9999
mutate_all(df1, as.character) %>%
mutate_each(funs(recode(., .missing = "9999"))) -> df1

# http://stackoverflow.com/questions/7680959/convert-type-of-multiple-columns-of-a-dataframe-at-once
# Credit to joran for this function.

convert.magic <- function(obj,types){

for (i in 1:length(obj)){
    FUN <- switch(types[i],character = as.character, 
                           numeric = as.numeric, 
                           factor = as.factor,
                           integer = as.integer,
                           logical = as.logical)
        obj[,i] <- FUN(obj[,i])
    }
    obj
}

out <- convert.magic(df1, original) %>%
       full_join(df2, by = "ID")

out

#   ID Value Group Count School
#1   1     3     A    NA   <NA>
#2   2  9999  9999    NA   <NA>
#3   3     7     C    NA   <NA>
#4   4     2     D    NA   <NA>
#5   5     3  9999     1      A
#6   6     4     B     2   <NA>
#7   7     5  9999     3      C
#8   8     6     C     4      D
#9   9     7     D     5   <NA>
#10 10     8  9999     6      B
#11 11    NA  <NA>     7     NA
#12 12    NA  <NA>     8      C
#13 13    NA  <NA>     9      D
#14 14    NA  <NA>    NA   <NA>
Comments