emax emax - 7 months ago 57
Python Question

Python: how to group pandas Data Frame in a certain time window?

Hello I have a dataframe

df
containing data of different trips from an origin
X
to a destination
Y
with starting time
T
. I want to count trips between
X
and
Y
in a certain time windows, let say
15 min
. So,

df:
X Y T
1 2 2015-12-30 22:30:00.0
1 2 2015-12-30 22:35:00.0
1 2 2015-12-30 22:40:00.0
1 2 2015-12-30 23:40:00.0
3 5 2015-11-30 13:40:00.0
3 5 2015-11-30 13:44:00.0
3 5 2015-11-30 19:54:00.0


I want

dfO:
X Y count
1 2 3
3 5 2


In order to count the all the trips from
X
to
Y
I did:

tmp = df.groupby(["X", "Y"]).size()


How can I take in consideration also the fact that I want to count only the same trips in a certain time interval
dt
?

Answer

Perhaps you are looking for pd.TimeGrouper. It allows you to group rows in a DataFrame by intervals of time, provided that the DataFrame has a DatetimeIndex:

import pandas as pd

df = pd.DataFrame({'T': ['2015-12-30 22:30:00.0',
                         '2015-12-30 22:35:00.0',
                         '2015-12-30 22:40:00.0',
                         '2015-12-30 23:40:00.0',
                         '2015-11-30 13:40:00.0',
                         '2015-11-30 13:44:00.0',
                         '2015-11-30 19:54:00.0'],
                   'X': [1, 1, 1, 1, 3, 3, 3],
                   'Y': [2, 2, 2, 2, 5, 5, 5]})
df['T'] = pd.to_datetime(df['T'])
df = df.set_index(['T'])
result = df.groupby([pd.TimeGrouper('15Min'), 'X', 'Y']).size()
print(result)

yields

T                    X  Y
2015-11-30 13:30:00  3  5    2
2015-11-30 19:45:00  3  5    1
2015-12-30 22:30:00  1  2    3
2015-12-30 23:30:00  1  2    1

This contains the information that you want

T                    X  Y
2015-11-30 13:30:00  3  5    2
2015-12-30 22:30:00  1  2    3

and more. It is unclear on what basis you wish to exclude the other rows. If you explain the criterion, we should be able to produce the desired DataFrame exactly.