YQ.Wang YQ.Wang - 2 months ago 8
R Question

How to add NA rows to an incomplete dataframe based on an complete index?

For the given incomplete dataframe

df
and complete index
t
:

t = seq(as.POSIXct("2016-01-01 00:05:00"), as.POSIXct("2016-01-01 01:00:00"), by = '5 min')
index<-t[c(1,2,4:7,9,12)]
a<-(1:8)
b<-(1:8)
df<-data.frame(index,a,b)


By my way, the missing rows can be added by the following code:

index<-t #complete index
a<-vector('numeric',12)
a<-NA
b<-vector('numeric',12)
b<-NA
empty_df<-data.frame(index,a,b) # build an complete NA dataframe
for (i in 1:12) {
if(!(df$index[i]==empty_df$index[i]))
df<-rbind(rbind(df[1:i-1,],empty_df[i,]),df[i:length(df$index),])} # comparison and revison


However, my solution have two problems:


  1. Cannot deal with the situation when the first row is missing.

  2. When the dataframe is large, the computing will take hours.



So I'm wondering if there is any easier way to deal with it?

Answer

We can do this with merge (base R) or left_join (from dplyr)

library(dplyr)
data.frame(index = t) %>%
              left_join(., df)

Or join from data.table

library(data.table)
setDT(df)[data.table(index=t), on = "index"]
Comments