trench trench - 2 months ago 16
Python Question

Matching range of timestamps in pandas

I am having trouble doing this in pandas

DF1 (my left join):

Name | TimeStart | TimeEnd | Values | Order
John 12/24/2014 08:10:32 12/24/2014 08:14:21 2 1
John 12/24/2014 08:15:03 12/24/2014 08:22:49 2 2


DF2

Name | TimeStart | TimeEnd | Values
John 12/24/2014 08:12:57 12/24/2014 08:13:31 8


TimeStart in DF2 is ALWAYS greater than the TimeStart in DF1 (this typically happens during the interaction). And for it to match, it would have to be less than the next row of data for that individual.

Here is my thought process. Shift the row to columns to see if they match. Then compare the DF2 TimeStart to be > than the initial TimeStart on DF1 but less than the next row TimeStart (same Name).

df1.sort(['Name', 'TimeStart'], ascending=[1, 1], inplace = True)
df1['Name_R'] = df1['Name'].shift(-1)
df1['Matching Row'] = np.where((df1['Name_R'] == df1['Name']), 1, 0)
df1['Next Timestamp'] = np.where(df1['Matching Row'] == 1, df1['TimeStart'].shift(-1), np.datetime64('nat'))
df1['test'] = np.where(df2['TimeStart'] > df1['TimeStart'] < df1['Next Timestamp'], 1, 0)


Edit - is it possible to do this with a asof command? The only trick is that the Name has to match, then we look for the closest timestamp for TimeStart on each file/dataframe.

Answer

There is an "asof join" in pandas 0.19. For your example, just ignore the starting time and join by the most recent ending time.

pd.merge_asof(DF1, DF2, on='TimeEnd')