ckluss ckluss - 23 days ago 7
R Question

linear interpolate missing values in time series

I would like to add all missing dates between min and max date in a

data.frame
and linear interpolate all missing values, like

df <- data.frame(date = as.Date(c("2015-10-05","2015-10-08","2015-10-09",
"2015-10-12","2015-10-14")),
value = c(8,3,9,NA,5))

date value
2015-10-05 8
2015-10-08 3
2015-10-09 9
2015-10-12 NA
2015-10-14 5

date value approx
2015-10-05 8 8
2015-10-06 NA 6.33
2015-10-07 NA 4.67
2015-10-08 3 3
2015-10-09 9 9
2015-10-10 NA 8.20
2015-10-11 NA 7.40
2015-10-12 NA 6.60
2015-10-13 NA 5.80
2015-10-14 5 5


Is there a clear solution with
dplyr
and
approx
?
(I do not like my 10 line
for
loop code.)

Answer

Here is one way. I created a data frame with a sequence of date using the first and last date. Using full_join() in the dplyr package, I merged the data frame and mydf. I then used na.approx() in the zoo package to handle the interpolation in the mutate() part.

mydf <- data.frame(date = as.Date(c("2015-10-05","2015-10-08","2015-10-09",
                                    "2015-10-12","2015-10-14")),       
                   value = c(8,3,9,NA,5))

library(dplyr)
library(zoo)

data.frame(date = seq(mydf$date[1], mydf$date[nrow(mydf)], by = 1)) %>%
full_join(mydf, by = "date") %>%
mutate(approx = na.approx(value))

#         date value   approx
#1  2015-10-05     8 8.000000
#2  2015-10-06    NA 6.333333
#3  2015-10-07    NA 4.666667
#4  2015-10-08     3 3.000000
#5  2015-10-09     9 9.000000
#6  2015-10-10    NA 8.200000
#7  2015-10-11    NA 7.400000
#8  2015-10-12    NA 6.600000
#9  2015-10-13    NA 5.800000
#10 2015-10-14     5 5.000000
Comments