Alex Alex - 3 months ago 8
R Question

Adding missing rows to time series data in r

I have a data set of GPS coordinates, however some of the times are missing. I want to add the missing rows in with the appropriate timestamp and NA's for the lat and lon columns

This is how the data is currently formatted:

timestamp lon lat id date time
<time> <dbl> <dbl> <int> <date> <chr>
1 2012-08-01 03:59:00 36.92288 0.3508941 1 2012-08-01 03:59:00
2 2012-08-01 03:59:01 36.92288 0.3508901 1 2012-08-01 03:59:01
3 2012-08-01 03:59:02 36.92288 0.3508868 1 2012-08-01 03:59:02
4 2012-08-01 03:59:03 36.92288 0.3508828 1 2012-08-01 03:59:03
5 2012-08-01 03:59:05 36.92288 0.3508845 1 2012-08-01 03:59:05
6 2012-08-01 03:59:06 36.92288 0.3508866 1 2012-08-01 03:59:06
7 2012-08-01 03:59:07 36.92288 0.3508885 1 2012-08-01 03:59:07
8 2012-08-01 03:59:08 36.92288 0.3508903 1 2012-08-01 03:59:08
9 2012-08-01 03:59:09 36.92288 0.3508915 1 2012-08-01 03:59:09


as you can see timestamp 2012-08-01 03:59:04 is missing I would like to have something like this at the end

timestamp lon lat id date time
<time> <dbl> <dbl> <int> <date> <chr>
1 2012-08-01 03:59:00 36.92288 0.3508941 1 2012-08-01 03:59:00
2 2012-08-01 03:59:01 36.92288 0.3508901 1 2012-08-01 03:59:01
3 2012-08-01 03:59:02 36.92288 0.3508868 1 2012-08-01 03:59:02
4 2012-08-01 03:59:03 36.92288 0.3508828 1 2012-08-01 03:59:03
5 2012-08-01 03:59:04 NA NA 1 2012-08-01 03:59:04
6 2012-08-01 03:59:05 36.92288 0.3508845 1 2012-08-01 03:59:05
7 2012-08-01 03:59:06 36.92288 0.3508866 1 2012-08-01 03:59:06
8 2012-08-01 03:59:07 36.92288 0.3508885 1 2012-08-01 03:59:07
9 2012-08-01 03:59:08 36.92288 0.3508903 1 2012-08-01 03:59:08
10 2012-08-01 03:59:09 36.92288 0.3508915 1 2012-08-01 03:59:09


Any help with this would be much appreciate!

Answer

Use rbind.fill from the plyr package:

ts_data <- read.table(text="timestamp,lon,lat,id,date,time
2012-08-01 03:59:00,36.92288,0.3508941,1,2012-08-01,03:59:00
2012-08-01 03:59:01,36.92288,0.3508901,1,2012-08-01,03:59:01
2012-08-01 03:59:02,36.92288,0.3508868,1,2012-08-01,03:59:02
2012-08-01 03:59:03,36.92288,0.3508828,1,2012-08-01,03:59:03
2012-08-01 03:59:05,36.92288,0.3508845,1,2012-08-01,03:59:05
2012-08-01 03:59:06,36.92288,0.3508866,1,2012-08-01,03:59:06
2012-08-01 03:59:07,36.92288,0.3508885,1,2012-08-01,03:59:07
2012-08-01 03:59:08,36.92288,0.3508903,1,2012-08-01,03:59:08
2012-08-01 03:59:09,36.92288,0.3508915,1,2012-08-01,03:59:09", 
                      header=T, stringsAsFactors=F, sep=",")

##Convert timestamp to datetime
ts_data$timestamp <- ts_data$timestamp <- as.POSIXct(strftime(ts_data$timestamp))

##Get full sequence
full_sequence <- seq(from=min(ts_data$timestamp), 
                     to=max(ts_data$timestamp), by="s")

##Grab the missing sequence
missing_sequence <- full_sequence[!(full_sequence %in% ts_data$timestamp)]

##Make a data.frame out of the missing sequence
missing_df <- data.frame(timestamp = missing_sequence, 
                         id = rep(1,length(missing_sequence)), 
                         date = strftime(missing_sequence, format = "%Y-%m-%d"), 
                         time=strftime(missing_sequence, format = "%H:%M:%S"))
##Combine the two
new_ts_data <- plyr::rbind.fill(ts_data, missing_df)
##Order by timestamp
new_ts_data <- new_ts_data[order(new_ts_data$timestamp),]

Desired output:

 timestamp           lon      lat       id date       time    
 2012-08-01 03:59:00 36.92288 0.3508941 1  2012-08-01 03:59:00
 2012-08-01 03:59:01 36.92288 0.3508901 1  2012-08-01 03:59:01
 2012-08-01 03:59:02 36.92288 0.3508868 1  2012-08-01 03:59:02
 2012-08-01 03:59:03 36.92288 0.3508828 1  2012-08-01 03:59:03
 2012-08-01 03:59:04       NA        NA 1  2012-08-01 03:59:04
 2012-08-01 03:59:05 36.92288 0.3508845 1  2012-08-01 03:59:05
 2012-08-01 03:59:06 36.92288 0.3508866 1  2012-08-01 03:59:06
 2012-08-01 03:59:07 36.92288 0.3508885 1  2012-08-01 03:59:07
 2012-08-01 03:59:08 36.92288 0.3508903 1  2012-08-01 03:59:08
 2012-08-01 03:59:09 36.92288 0.3508915 1  2012-08-01 03:59:09

Edit to work with the appropriate dataset

The filtered dataset is saved here and a fully working script is here.

Comments