dleal dleal - 3 months ago 17
Python Question

Merge dataframes on nearest datetime / timestamp

I have two data frames as follows:

A = pd.DataFrame({"ID":["A", "A", "C" ,"B", "B"], "date":["06/22/2014","07/02/2014","01/01/2015","01/01/1991","08/02/1999"]})

B = pd.DataFrame({"ID":["A", "A", "C" ,"B", "B"], "date":["02/15/2015","06/30/2014","07/02/1999","10/05/1990","06/24/2014"], "value": ["3","5","1","7","8"] })


Which look like the following:

>>> A
ID date
0 A 2014-06-22
1 A 2014-07-02
2 C 2015-01-01
3 B 1991-01-01
4 B 1999-08-02

>>> B
ID date value
0 A 2015-02-15 3
1 A 2014-06-30 5
2 C 1999-07-02 1
3 B 1990-10-05 7
4 B 2014-06-24 8


I want to merge A with the values of B using the nearest date. In this example, none of the dates match, but it could the the case that some do.

The output should be something like this:

>>> C
ID date value
0 A 06/22/2014 8
1 A 07/02/2014 5
2 C 01/01/2015 3
3 B 01/01/1991 7
4 B 08/02/1999 1


It seems to me that there should be a native function in pandas that would allow this.

Note: as similar question has been asked here
pandas.merge: match the nearest time stamp >= the series of timestamps

Answer

You can use reindex with method='nearest' and then merge:

A['date'] = pd.to_datetime(A.date)
B['date'] = pd.to_datetime(B.date)
A.sort_values('date', inplace=True)
B.sort_values('date', inplace=True)

B1 = B.set_index('date').reindex(A.set_index('date').index, method='nearest').reset_index()
print (B1)

print (pd.merge(A,B1, on='date'))
  ID_x       date ID_y value
0    B 1991-01-01    B     7
1    B 1999-08-02    C     1
2    A 2014-06-22    B     8
3    A 2014-07-02    A     5
4    C 2015-01-01    A     3

You can also add parameter suffixes:

print (pd.merge(A,B1, on='date', suffixes=('_', '')))
  ID_       date ID value
0   B 1991-01-01  B     7
1   B 1999-08-02  C     1
2   A 2014-06-22  B     8
3   A 2014-07-02  A     5
4   C 2015-01-01  A     3
Comments