Ryogi Ryogi - 3 months ago 13
R Question

Replacing NAs with latest non-NA value

In a data.frame (or data.table), I would like to "fill forward" NAs with the closest previous non-NA value. A simple example, using vectors (instead of a

data.frame
) is the following:

> y <- c(NA, 2, 2, NA, NA, 3, NA, 4, NA, NA)


I would like a function
fill.NAs()
that allows me to construct
yy
such that:

> yy
[1] NA NA NA 2 2 2 2 3 3 3 4 4


I need to repeat this operation for many (total ~1 Tb) small sized
data.frame
s (~30-50 Mb), where a row is NA is all its entries are. What is a good way to approach the problem?

The ugly solution I cooked up uses this function:

last <- function (x){
x[length(x)]
}

fill.NAs <- function(isNA){
if (isNA[1] == 1) {
isNA[1:max({which(isNA==0)[1]-1},1)] <- 0 # first is NAs
# can't be forward filled
}
isNA.neg <- isNA.pos <- isNA.diff <- diff(isNA)
isNA.pos[isNA.diff < 0] <- 0
isNA.neg[isNA.diff > 0] <- 0
which.isNA.neg <- which(as.logical(isNA.neg))
if (length(which.isNA.neg)==0) return(NULL) # generates warnings later, but works
which.isNA.pos <- which(as.logical(isNA.pos))
which.isNA <- which(as.logical(isNA))
if (length(which.isNA.neg)==length(which.isNA.pos)){
replacement <- rep(which.isNA.pos[2:length(which.isNA.neg)],
which.isNA.neg[2:max(length(which.isNA.neg)-1,2)] -
which.isNA.pos[1:max(length(which.isNA.neg)-1,1)])
replacement <- c(replacement, rep(last(which.isNA.pos), last(which.isNA) - last(which.isNA.pos)))
} else {
replacement <- rep(which.isNA.pos[1:length(which.isNA.neg)], which.isNA.neg - which.isNA.pos[1:length(which.isNA.neg)])
replacement <- c(replacement, rep(last(which.isNA.pos), last(which.isNA) - last(which.isNA.pos)))
}
replacement
}


The function
fill.NAs
is used as follows:

y <- c(NA, 2, 2, NA, NA, 3, NA, 4, NA, NA)
isNA <- as.numeric(is.na(y))
replacement <- fill.NAs(isNA)
if (length(replacement)){
which.isNA <- which(as.logical(isNA))
to.replace <- which.isNA[which(isNA==0)[1]:length(which.isNA)]
y[to.replace] <- y[replacement]
}


Output

> y
[1] NA 2 2 2 2 3 3 3 4 4 4


... which seems to work. But, man, is it ugly! Any suggestions?

Answer

You probably want to use the na.locf() function from the zoo package to carry the last observation forward to replace your NA values.

Here is the beginning of its usage example from the help page:

> example(na.locf)

na.lcf> az <- zoo(1:6)

na.lcf> bz <- zoo(c(2,NA,1,4,5,2))

na.lcf> na.locf(bz)
1 2 3 4 5 6 
2 2 1 4 5 2 

na.lcf> na.locf(bz, fromLast = TRUE)
1 2 3 4 5 6 
2 1 1 4 5 2 

na.lcf> cz <- zoo(c(NA,9,3,2,3,2))

na.lcf> na.locf(cz)
2 3 4 5 6 
9 3 2 3 2