ckluss - 1 year ago 52
R Question

# linear interpolate missing values in time series

I would like to add all missing dates between min and max date in a

`data.frame`
and linear interpolate all missing values, like

``````df <- data.frame(date = as.Date(c("2015-10-05","2015-10-08","2015-10-09",
"2015-10-12","2015-10-14")),
value = c(8,3,9,NA,5))

date value
2015-10-05     8
2015-10-08     3
2015-10-09     9
2015-10-12    NA
2015-10-14     5

date value approx
2015-10-05     8      8
2015-10-06    NA   6.33
2015-10-07    NA   4.67
2015-10-08     3      3
2015-10-09     9      9
2015-10-10    NA   8.20
2015-10-11    NA   7.40
2015-10-12    NA   6.60
2015-10-13    NA   5.80
2015-10-14     5      5
``````

Is there a clear solution with
`dplyr`
and
`approx`
?
(I do not like my 10 line
`for`
loop code.)

Here is one way. I created a data frame with a sequence of date using the first and last date. Using `full_join()` in the `dplyr` package, I merged the data frame and `mydf`. I then used `na.approx()` in the zoo package to handle the interpolation in the `mutate()` part.

``````mydf <- data.frame(date = as.Date(c("2015-10-05","2015-10-08","2015-10-09",
"2015-10-12","2015-10-14")),
value = c(8,3,9,NA,5))

library(dplyr)
library(zoo)

data.frame(date = seq(mydf\$date[1], mydf\$date[nrow(mydf)], by = 1)) %>%
full_join(mydf, by = "date") %>%
mutate(approx = na.approx(value))

#         date value   approx
#1  2015-10-05     8 8.000000
#2  2015-10-06    NA 6.333333
#3  2015-10-07    NA 4.666667
#4  2015-10-08     3 3.000000
#5  2015-10-09     9 9.000000
#6  2015-10-10    NA 8.200000
#7  2015-10-11    NA 7.400000
#8  2015-10-12    NA 6.600000
#9  2015-10-13    NA 5.800000
#10 2015-10-14     5 5.000000
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download