MYjx MYjx - 3 months ago 10
R Question

How to get the overlap of two missing variables in R (similar to correlation matrix)

I want to use a simpler way to get the overlap of two missing variables and construct a heatmap similar to correlation matrix. The data I have is as below:

set.seed(123)
data = data.frame(id = 1:1000, age_missing = sample(c(0,1),1000, replace = TRUE), salary_missing = sample(c(0,1),1000, replace = TRUE),
address_missing = sample(c(0,1),1000, replace = TRUE),
gender_missing =sample(c(0,1),1000, replace = TRUE) )


The ideal output is

|var1 | var2| Missing Percent|
------------------------------
age age 0.5
age gender 0.05
age address 0.08
gender gender 0.15
gender age 0.05

Answer

Maybe something along the lines of

dd <- as.matrix(data[,2:5])
crossprod(dd) / nrow(dd)

which yields

                age_missing salary_missing address_missing
age_missing           0.493          0.231           0.251
salary_missing        0.231          0.497           0.248
address_missing       0.251          0.248           0.494
gender_missing        0.244          0.271           0.247
                gender_missing
age_missing              0.244
salary_missing           0.271
address_missing          0.247
gender_missing           0.506
Comments