IJH - 1 month ago 11
R Question

# Aligning Data frame with missing values

I'm using a data frame with many

`NA`
values. While I'm able to create a linear model, I am subsequently unable to line the fitted values of the model up with the original data due to the missing values and lack of indicator column.

Here's a reproducible example:

``````library(MASS)
dat <- Aids2
dat[floor(runif(100, min = 1, max = nrow(dat))),3] <- NA
# Create a model
model <- lm(death ~ diag + age, data = dat)
# Different Values
length(fitted.values(model))
# 2745
nrow(dat)
# 2843
``````

There are actually three solutions here:

1. pad `NA` to fitted values ourselves;
2. use `predict()` to compute fitted values;
3. drop incomplete cases ourselves and pass only complete cases to `lm()`.

Option 1

``````## row indicator with `NA`
id <- attr(na.omit(dat), "na.action")
fitted <- rep(NA, nrow(dat))
fitted[-id] <- model\$fitted
nrow(dat)
# 2843
length(fitted)
# 2843
sum(!is.na(fitted))
# 2745
``````

Option 2

``````## the default NA action for "predict.lm" is "na.pass"
pred <- predict(model, newdata = dat)  ## has to use "newdata = dat" here!
nrow(dat)
# 2843
length(pred)
# 2843
sum(!is.na(pred))
# 2745
``````

Option 3

Alternatively, you might simply pass a data frame without any `NA` to `lm()`:

``````complete.dat <- na.omit(dat)
fit <- lm(death ~ diag + age, data = complete.dat)
nrow(complete.dat)
# 2745
length(fit\$fitted)
# 2745
sum(!is.na(fit\$fitted))
# 2745
``````

In summary,

• Option 1 does the "alignment" in a straightforward manner by padding `NA`, but I think people seldom take this approach;
• Option 2 is really simple, but it is more computationally costly;
• Option 3 is my favourite as it keeps all things simple.
Source (Stackoverflow)