trench trench - 1 year ago 103
Python Question

Matching range of timestamps in pandas

I am having trouble doing this in pandas

DF1 (my left join):

Name | TimeStart | TimeEnd | Values | Order
John 12/24/2014 08:10:32 12/24/2014 08:14:21 2 1
John 12/24/2014 08:15:03 12/24/2014 08:22:49 2 2


Name | TimeStart | TimeEnd | Values
John 12/24/2014 08:12:57 12/24/2014 08:13:31 8

TimeStart in DF2 is ALWAYS greater than the TimeStart in DF1 (this typically happens during the interaction). And for it to match, it would have to be less than the next row of data for that individual.

Here is my thought process. Shift the row to columns to see if they match. Then compare the DF2 TimeStart to be > than the initial TimeStart on DF1 but less than the next row TimeStart (same Name).

df1.sort(['Name', 'TimeStart'], ascending=[1, 1], inplace = True)
df1['Name_R'] = df1['Name'].shift(-1)
df1['Matching Row'] = np.where((df1['Name_R'] == df1['Name']), 1, 0)
df1['Next Timestamp'] = np.where(df1['Matching Row'] == 1, df1['TimeStart'].shift(-1), np.datetime64('nat'))
df1['test'] = np.where(df2['TimeStart'] > df1['TimeStart'] < df1['Next Timestamp'], 1, 0)

Edit - is it possible to do this with a asof command? The only trick is that the Name has to match, then we look for the closest timestamp for TimeStart on each file/dataframe.

Answer Source

There is an "asof join" in pandas 0.19. For your example, just ignore the starting time and join by the most recent ending time.

pd.merge_asof(DF1, DF2, on='TimeEnd')
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download