IJH - 9 months ago 42

R Question

I'm using a data frame with many

`NA`

Here's a reproducible example:

`library(MASS)`

dat <- Aids2

# Add NA's

dat[floor(runif(100, min = 1, max = nrow(dat))),3] <- NA

# Create a model

model <- lm(death ~ diag + age, data = dat)

# Different Values

length(fitted.values(model))

# 2745

nrow(dat)

# 2843

Answer

There are actually three solutions here:

- pad
`NA`

to fitted values ourselves; - use
`predict()`

to compute fitted values; - drop incomplete cases ourselves and pass only complete cases to
`lm()`

.

**Option 1**

```
## row indicator with `NA`
id <- attr(na.omit(dat), "na.action")
fitted <- rep(NA, nrow(dat))
fitted[-id] <- model$fitted
nrow(dat)
# 2843
length(fitted)
# 2843
sum(!is.na(fitted))
# 2745
```

**Option 2**

```
## the default NA action for "predict.lm" is "na.pass"
pred <- predict(model, newdata = dat) ## has to use "newdata = dat" here!
nrow(dat)
# 2843
length(pred)
# 2843
sum(!is.na(pred))
# 2745
```

**Option 3**

Alternatively, you might simply pass a data frame without any `NA`

to `lm()`

:

```
complete.dat <- na.omit(dat)
fit <- lm(death ~ diag + age, data = complete.dat)
nrow(complete.dat)
# 2745
length(fit$fitted)
# 2745
sum(!is.na(fit$fitted))
# 2745
```

In summary,

**Option 1**does the "alignment" in a straightforward manner by padding`NA`

, but I think people seldom take this approach;**Option 2**is really simple, but it is more computationally costly;**Option 3**is my favourite as it keeps all things simple.

Source (Stackoverflow)