sbg sbg - 3 months ago 8
R Question

How to create "NA" for missing data in a time series

I have several files of data that look like this:

X code year month day pp
1 4515 1953 6 1 0
2 4515 1953 6 2 0
3 4515 1953 6 3 0
4 4515 1953 6 4 0
5 4515 1953 6 5 3.5


Sometimes there is data missing, but I don't have NAs, the rows simply don't exist. I need to create NAs when the data is missing. I though I could start by identifying when that occurs by converting it to a zoo object and check for strict regularity (I never used zoo before), I used the following code:

z.date<-paste(CET$year, CET$month, CET$day, sep="/")
z <- read.zoo(CET, order.by= z.date )
reg<-is.regular(z, strict = TRUE)


But the answer is always true!

Can anyone tell me why is not working? Or even better, tell me a way to create NAs when the data is missing (with or without zoo package)?

thanks

Answer

The seq function has some interesting features that you can use to easily generate a complete sequence of dates. For example, the following code can be used to generate a sequence of dates starting on April 25:

Edit: This feature is documented in ?seq.Date

start = as.Date("2011/04/25")
full <- seq(start, by='1 day', length=15)
full

 [1] "2011-04-25" "2011-04-26" "2011-04-27" "2011-04-28" "2011-04-29"
 [6] "2011-04-30" "2011-05-01" "2011-05-02" "2011-05-03" "2011-05-04"
[11] "2011-05-05" "2011-05-06" "2011-05-07" "2011-05-08" "2011-05-09"

Now use the same principle to generate some data with "missing" rows, by generating the sequence for every 2nd day:

partial <- data.frame(
    date=seq(start, by='2 day', length=6),
    value=1:6
)
partial

        date value
1 2011-04-25     1
2 2011-04-27     2
3 2011-04-29     3
4 2011-05-01     4
5 2011-05-03     5
6 2011-05-05     6

To answer your question, one can use vector subscripting or the match function to create a dataset with NAs:

with(partial, value[match(full, date)])
 [1]  1 NA  2 NA  3 NA  4 NA  5 NA  6 NA NA NA NA

To combine this result with the original full data:

data.frame(Date=full, value=with(partial, value[match(full, date)]))
         Date value
1  2011-04-25     1
2  2011-04-26    NA
3  2011-04-27     2
4  2011-04-28    NA
5  2011-04-29     3
6  2011-04-30    NA
7  2011-05-01     4
8  2011-05-02    NA
9  2011-05-03     5
10 2011-05-04    NA
11 2011-05-05     6
12 2011-05-06    NA
13 2011-05-07    NA
14 2011-05-08    NA
15 2011-05-09    NA
Comments