EduardoRL EduardoRL - 12 days ago 5
Python Question

How to know in python if a data frame with date format is included within a time interval (Vectorizing)

I have the following dataframe:

index arrivalTime
0 01/11/2016 06:53:15
1 01/11/2016 06:55:25
2 03/11/2016 16:57:10
3 03/11/2016 16:57:30
4 04/11/2016 18:58:10
5 06/11/2016 19:58:35
6 08/11/2016 20:00:05


I need know the rows that are between 16:00:00 and 20:00:00 in the different days. I give an example the results:

index arrivalTime result
0 01/11/2016 06:53:15 0
1 01/11/2016 06:55:25 0
2 03/11/2016 16:57:10 1
3 03/11/2016 16:57:30 1
4 04/11/2016 18:58:10 1
5 06/11/2016 19:58:35 1
6 08/11/2016 20:00:05 0


I get the above results using apply function and one nested function but it is very slow and I would like to do it "vectorizing" but I can not.

def function_time(df):

df['hora_lle'] = df['arrivalTime'].map(lambda x: x[-8:])

def class(hora_lle):

x = dt.datetime.strptime(hora_lle, '%H:%M:%S').time()
h1 = dt.datetime.strptime('16:00:00', '%H:%M:%S').time()
h2 = dt.datetime.strptime('20:00:00', '%H:%M:%S').time()

if x < h1:
return 0
elif h1 <= x < h2:
return 1
elif h2 <= x:
return 0

df['seg'] = df['hora_sal'].apply(class)
return df


Thanks you very much in advance

Answer

I think you can use to_datetime first and then create mask, which is converted to int by astype, but this solution not works if minutes and seconds are 0:

df.arrivalTime = pd.to_datetime(df.arrivalTime) 
h = df.arrivalTime.dt.hour  
mask = (h > 15) & (h < 20)
df['result'] = mask.astype(int)

print (df)
   index         arrivalTime  result
0      0 2016-01-11 06:53:15       0
1      1 2016-01-11 06:55:25       0
2      2 2016-03-11 16:57:10       1
3      3 2016-03-11 16:57:30       1
4      4 2016-04-11 18:58:10       1
5      5 2016-06-11 19:58:35       1
6      6 2016-08-11 20:00:05       0

Similar solution with dt.time - extract times and then compare them:

df.arrivalTime = pd.to_datetime(df.arrivalTime)   

h1 = dt.datetime.strptime('16:00:00', '%H:%M:%S').time()
h2 = dt.datetime.strptime('20:00:00', '%H:%M:%S').time()
times = df.arrivalTime.dt.time
mask = (times >= h1) & (times <= h2)
df['result'] = mask.astype(int)
print (df)
          arrivalTime  result
0 2016-01-11 06:53:15       0
1 2016-01-11 06:55:25       0
2 2016-03-11 16:57:10       1
3 2016-03-11 16:57:30       1
4 2016-04-11 18:58:10       1
5 2016-06-11 19:58:35       1
6 2016-08-11 20:00:05       0

Another solution with between_time which works only with DatetimeIndex - output is assign to column result and tested for notnull:

df.arrivalTime = pd.to_datetime(df.arrivalTime) 
df['result'] = 1
df.set_index('arrivalTime', inplace=True)
df['result'] = df.between_time('16:00', '20:00')
df['result'] = df['result'].notnull().astype(int)
print (df)
                     result
arrivalTime                
2016-01-11 06:53:15       0
2016-01-11 06:55:25       0
2016-03-11 16:57:10       1
2016-03-11 16:57:30       1
2016-04-11 18:58:10       1
2016-06-11 19:58:35       1
2016-08-11 20:00:05       0
Comments