Ryogi - 1 year ago 143
R Question

# Replacing NAs with latest non-NA value

In a data.frame (or data.table), I would like to "fill forward" NAs with the closest previous non-NA value. A simple example, using vectors (instead of a

`data.frame`
) is the following:

``````> y <- c(NA, 2, 2, NA, NA, 3, NA, 4, NA, NA)
``````

I would like a function
`fill.NAs()`
that allows me to construct
`yy`
such that:

``````> yy
[1] NA NA NA  2  2  2  2  3  3  3  4  4
``````

I need to repeat this operation for many (total ~1 Tb) small sized
`data.frame`
s (~30-50 Mb), where a row is NA is all its entries are. What is a good way to approach the problem?

The ugly solution I cooked up uses this function:

``````last <- function (x){
x[length(x)]
}

fill.NAs <- function(isNA){
if (isNA[1] == 1) {
isNA[1:max({which(isNA==0)[1]-1},1)] <- 0 # first is NAs
# can't be forward filled
}
isNA.neg <- isNA.pos <- isNA.diff <- diff(isNA)
isNA.pos[isNA.diff < 0] <- 0
isNA.neg[isNA.diff > 0] <- 0
which.isNA.neg <- which(as.logical(isNA.neg))
if (length(which.isNA.neg)==0) return(NULL) # generates warnings later, but works
which.isNA.pos <- which(as.logical(isNA.pos))
which.isNA <- which(as.logical(isNA))
if (length(which.isNA.neg)==length(which.isNA.pos)){
replacement <- rep(which.isNA.pos[2:length(which.isNA.neg)],
which.isNA.neg[2:max(length(which.isNA.neg)-1,2)] -
which.isNA.pos[1:max(length(which.isNA.neg)-1,1)])
replacement <- c(replacement, rep(last(which.isNA.pos), last(which.isNA) - last(which.isNA.pos)))
} else {
replacement <- rep(which.isNA.pos[1:length(which.isNA.neg)], which.isNA.neg - which.isNA.pos[1:length(which.isNA.neg)])
replacement <- c(replacement, rep(last(which.isNA.pos), last(which.isNA) - last(which.isNA.pos)))
}
replacement
}
``````

The function
`fill.NAs`
is used as follows:

``````y <- c(NA, 2, 2, NA, NA, 3, NA, 4, NA, NA)
isNA <- as.numeric(is.na(y))
replacement <- fill.NAs(isNA)
if (length(replacement)){
which.isNA <- which(as.logical(isNA))
to.replace <- which.isNA[which(isNA==0)[1]:length(which.isNA)]
y[to.replace] <- y[replacement]
}
``````

Output

``````> y
[1] NA  2  2  2  2  3  3  3  4  4  4
``````

... which seems to work. But, man, is it ugly! Any suggestions?

You probably want to use the `na.locf()` function from the zoo package to carry the last observation forward to replace your NA values.

Here is the beginning of its usage example from the help page:

``````> example(na.locf)

na.lcf> az <- zoo(1:6)

na.lcf> bz <- zoo(c(2,NA,1,4,5,2))

na.lcf> na.locf(bz)
1 2 3 4 5 6
2 2 1 4 5 2

na.lcf> na.locf(bz, fromLast = TRUE)
1 2 3 4 5 6
2 1 1 4 5 2

na.lcf> cz <- zoo(c(NA,9,3,2,3,2))

na.lcf> na.locf(cz)
2 3 4 5 6
9 3 2 3 2
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download