Ryan Rothman - 1 year ago 117
R Question

# How to subset data in R without losing NA rows?

I have some data that I am looking at in R. One particular column, titled "Height", contains a few rows of NA.

I am looking to subset my data-frame so that all Heights above a certain value are excluded from my analysis.

``````df2 <- subset ( df1 , Height < 40 )
``````

However whenever I do this, R automatically removes all rows that contain NA values for Height. I do not want this. I have tried including arguments for na.rm

``````f1 <- function ( x , na.rm = FALSE ) {
df2 <- subset ( x , Height < 40 )
}
f1 ( df1 , na.rm = FALSE )
``````

but this does not seem to do anything; the rows with NA still end up disappearing from my data-frame. Is there a way of subsetting my data as such, without losing the NA rows?

If we decide to use `subset` function, then we need to watch out:

``````For ordinary vectors, the result is simply ‘x[subset & !is.na(subset)]’.
``````

So only non-NA values will be retained.

If you want to keep `NA` cases, use logical or condition to tell R not to drop `NA` cases:

``````subset(df1, Height < 40 | is.na(Height))
# or `df1[df1\$Height < 40 | is.na(df1\$Height), ]`
``````

Don't use directly (to be explained soon):

``````df2 <- df1[df1\$Height < 40, ]
``````

Example

``````df1 <- data.frame(Height = c(NA, 2, 4, NA, 50, 60), y = 1:6)

subset(df1, Height < 40 | is.na(Height))

#  Height y
#1     NA 1
#2      2 2
#3      4 3
#4     NA 4

df1[df1\$Height < 40, ]

#  Height  y
#1     NA NA
#2      2  2
#3      4  3
#4     NA NA
``````

The reason that the latter fails, is that indexing by `NA` gives `NA`. Consider this simple example with a vector:

``````x <- 1:4
ind <- c(NA, TRUE, NA, FALSE)
x[ind]
# [1] NA  2 NA
``````

We need to somehow replace those `NA` with `TRUE`. The most straightforward way is to add another "or" condition `is.na(ind)`:

``````x[ind | is.na(ind)]
# [1] 1 2 3
``````

This is exactly what will happen in your situation. If your `Height` contains `NA`, then logical operation `Height < 40` ends up a mix of `TRUE` / `FALSE` / `NA`, so we need replace `NA` by `TRUE` as above.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download