MJ30 MJ30 - 1 month ago 11
R Question

Combine data within two data frames in R

I have two data frames that I want to combine; however, I only want to keep one date. df1 will be the months from Jan 01, 2013 to Oct, 1, 2016. df2 will contain the frequency of occurrence of an event. If there was no event that month, df2 will not show a value.

df1 <- data.frame(date=seq(as.Date("2013-01-01"), as.Date("2016-10-01"), by="month"))

df1
date Freq
1 2013-01-01 0
2 2013-02-01 0
3 2013-03-01 0
4 2013-04-01 0
5 2013-05-01 0
...

df2
date Freq
1 2013-03-01 1
2 2013-08-01 2
3 2014-04-01 5
4 2014-05-01 2
5 2014-06-01 5
...


I want the new data frame to look like the following.

date Freq
1 2013-01-01 0
2 2013-02-01 0
3 2013-03-01 1
4 2013-04-01 0
5 2013-05-01 0
6 2013-06-01 0
7 2013-07-01 0
8 2013-08-01 2
9 2013-09-01 0
...

Answer

You can merge with all.x=TRUE and then set the NA's resulting from the merge to zero:

out <- merge(df1,df2,all.x=TRUE)
out[is.na(out)] <- 0
head(out,10)
##         date Freq
##1  2013-01-01    0
##2  2013-02-01    0
##3  2013-03-01    1
##4  2013-04-01    0
##5  2013-05-01    0
##6  2013-06-01    0
##7  2013-07-01    0
##8  2013-08-01    2
##9  2013-09-01    0
##10 2013-10-01    0

Data: where df1 is created as in the OP:

df1 <- data.frame(date=seq(as.Date("2013-01-01"), as.Date("2016-10-01"), by="month"))

df1 <- structure(list(date = structure(c(15706, 15737, 15765, 15796, 
15826, 15857, 15887, 15918, 15949, 15979, 16010, 16040, 16071, 
16102, 16130, 16161, 16191, 16222, 16252, 16283, 16314, 16344, 
16375, 16405, 16436, 16467, 16495, 16526, 16556, 16587, 16617, 
16648, 16679, 16709, 16740, 16770, 16801, 16832, 16861, 16892, 
16922, 16953, 16983, 17014, 17045, 17075), class = "Date")), .Names = "date", row.names = c(NA, 
-46L), class = "data.frame")
##         date
##1  2013-01-01
##2  2013-02-01
##3  2013-03-01
##4  2013-04-01
##5  2013-05-01
## ...
##42 2016-06-01
##43 2016-07-01
##44 2016-08-01
##45 2016-09-01
##46 2016-10-01

df2 <- structure(list(date = structure(c(15765, 15918, 16161, 16191, 
16222), class = "Date"), Freq = c(1L, 2L, 5L, 2L, 5L)), .Names = c("date", 
"Freq"), row.names = c(NA, -5L), class = "data.frame")
##        date Freq
##1 2013-03-01    1
##2 2013-08-01    2
##3 2014-04-01    5
##4 2014-05-01    2
##5 2014-06-01    5