klubow klubow - 1 month ago 7
Python Question

Select closest date (or value) in pandas / python

How to select rows with plr_date closest to mtc_date ?
Preferably without looping over each row.
I would like to end up with rows 13, 28, 43.

I should clarify: from the set of rows with same mtc_date select one row with the closest plr_date.

mtc_date mtc_id plr_date plr_id plr_measure
12 2010-03-15 1 2010-01-31 1 0
13 2010-03-15 1 2010-02-28 1 1
24 2010-06-15 2 2010-01-31 2 12
25 2010-06-15 2 2010-02-28 2 13
26 2010-06-15 2 2010-03-31 2 14
27 2010-06-15 2 2010-04-30 2 15
28 2010-06-15 2 2010-05-31 2 16
36 2010-09-15 2 2010-01-31 2 12
37 2010-09-15 2 2010-02-28 2 13
38 2010-09-15 2 2010-03-31 2 14
39 2010-09-15 2 2010-04-30 2 15
40 2010-09-15 2 2010-05-31 2 16
41 2010-09-15 2 2010-06-30 2 17
42 2010-09-15 2 2010-07-31 2 18
43 2010-09-15 2 2010-08-31 2 19


I would like to add code to reproduce but "It looks like your post is mostly code ...",

import pandas as pd
import numpy as np


### 1 df
plr = pd.concat(\
[pd.DataFrame({'plr_date': pd.date_range("01/01/2010", "12/31/2010", freq="m"),\
'plr_id': 1 , 'plr_measure': 10}),\
pd.DataFrame({'plr_date': pd.date_range("01/01/2010", "12/31/2010", freq="m"),\
'plr_id': 2 , 'plr_measure': 20})])

#plr = plr.reindex(index=range(0, plr.shape[0]), columns=None, fill_value=0)

plr['plr_measure']=np.arange(0, plr.shape[0])


### 2 df
mtc = pd.DataFrame({'mtc_date':pd.to_datetime(['15/01/2010','15/03/2010','15/06/2010','15/09/2010']),\
'mtc_id':(1,1,2,2)})

mtc = mtc.merge(plr, left_on='mtc_id', right_on='plr_id', how='left')



#select only smaller
mtc = mtc[(mtc.plr_date < mtc.mtc_date)]

#leave the bigest

Answer

Based on your comment, here is the updated solution using groupby.

For each group with same mtc_date, find the index that minimizes the absolute difference (in days) between mtc_date and plr_date.

min_indexes = mtc.groupby('mtc_date').apply(lambda x: (x['plr_date'] - x['mtc_date']).apply(lambda y: int(y.days)).abs().argmin())

enter image description here