Ashley Thomas Ashley Thomas - 4 months ago 18
R Question

Merging a dataset based on ID and date

I have two datasets each that are monthly summaries of different variables for households. I want to merge the two based on the household ID and the month.

df1 one looks like this:

hh_ids date total
<chr> <chr> <dbl>
1 KELDK13 2013-8-1 1
2 KMOMB02 2013-2-1 1
3 KMOMB02 2013-5-1 2
4 KMOMB04 2013-7-1 2
5 KMOMB04 2013-9-1 1
6 KMOMB06 2013-6-1 1
7 KMOMB14 2013-8-1 1
8 KMOMB16 2013-6-1 1
9 KMOMB17 2012-10-1 1
10 KMOMB17 2012-11-1 2


and the first 10 rows of df2 looks like:

hh_ids date income consumption alcohol cleaning_materials clothing
1 KELDK01 2012-11-1 62.70588 40.52941 0 0.000000 0.000000
2 KELDK01 2012-12-1 17.64706 42.43530 0 1.058824 7.058824
3 KELDK01 2013-01-1 91.76471 48.23529 0 0.000000 0.000000
4 KELDK01 2013-02-1 91.76470 107.52940 0 0.000000 0.000000
5 KELDK01 2013-03-1 116.47060 114.47060 0 0.000000 0.000000
6 KELDK01 2013-04-1 124.41180 118.29410 0 2.705882 17.647060
7 KELDK01 2013-05-1 137.23530 105.00000 0 1.411765 1.882353
8 KELDK01 2013-06-1 131.52940 109.54120 0 4.352942 2.941176
9 KELDK01 2013-07-1 121.52940 113.47060 0 2.352941 25.882350
10 KELDK01 2013-08-1 123.32940 86.50588 0 2.588235 2.941176


I want the "total" column from to be added on as a column in df2 with the matched hh_ids and date.

I have tried to do the following:

df3<-merge(df2,df1,by=c("hh_ids","date"))


However, my df2 has 53 rows, and df1 has many more, and the resulting df3 only has 14 rows. Any suggestions would be much appreciated!

Answer

If you want to retain all rows from your df2 even if they do not match to anything in df1 then you can use the all parameter in your merge:

df3 <- merge(df2, df1, by=c("hh_ids","date"), all.x=TRUE)

This is equivalent to doing a LEFT JOIN in SQL between df2 on the left and df1 on the right.