user2333196 - 4 months ago 7
Python Question

count of last n days per group

I have a

`DataFrame`
like this

``````df = pd.DataFrame({'Team':['CHI','IND','CHI','CHI','IND','CHI','CHI','IND'],
'Date':[datetime.date(2015,10,27),datetime.date(2015,10,28),datetime.date(2015,10,29),datetime.date(2015,10,30),datetime.date(2015,11,1),datetime.date(2015,11,2),datetime.date(2015,11,4),datetime.date(2015,11,4)]})
``````

I can find the number of rest days between games using this.

``````df['TeamRest'] = df.groupby('Team')['Date'].diff() - datetime.timedelta(1)
``````

I would like to also add a row to the
`DataFrame`
that keeps track of how many games each team has played in the last 5 days.

With `Date` converted to `datetime` so it can be used as `DateTimeIndex`, which will be important for the `rolling_count` with `daily frequency`

``````df.Date = pd.to_datetime(df.Date)
``````

1) calculate the difference in days between games per team:

``````df['days_between'] = df.groupby('Team')['Date'].diff() - timedelta(days=1)
``````

2) calculate the rolling count of games for the last 5 days per team:

``````df['game_count'] = 1
rolling_games_count = df.set_index('Date').groupby('Team').apply(lambda x: pd.rolling_count(x, window=5, freq='D')).reset_index()
df = df.drop('game_count', axis=1).merge(rolling_games_count, on=['Team', 'Date'], how='left')
``````

to get:

``````        Date Team  days_between  game_count
0 2015-10-27  CHI           NaT           1
1 2015-10-28  IND           NaT           1
2 2015-10-29  CHI        1 days           2
3 2015-10-30  CHI        0 days           3
4 2015-11-01  IND        3 days           2
5 2015-11-02  CHI        2 days           3
6 2015-11-04  CHI        1 days           2
7 2015-11-04  IND        2 days           2
``````

If you were to

``````df = pd.DataFrame({'Team':['CHI','IND','CHI','CHI','IND','CHI','CHI','IND'], 'Date': [date(2015,10,27),date(2015,10,28),date(2015,10,29),date(2015,10,30),date(2015,11,1),date(2015,11,2),date(2015,11,4),date(2015,12,10)]})
df['game'] = 1  # initialize a game to count.
df['nb_games'] = df.groupby('Team')['game'].apply(pd.rolling_count, 5)
``````

you get the surprising result (one `Date` changed to one month later)

``````         Date Team  game  nb_games
0  2015-10-27  CHI     1         1
2  2015-10-29  CHI     1         2
3  2015-10-30  CHI     1         3
5  2015-11-02  CHI     1         4
6  2015-11-04  CHI     1         5
1  2015-10-28  IND     1         1
4  2015-11-01  IND     1         2
7  2015-12-10  IND     1         3
``````

of `nb_games=3` for a later date in December, when there were no games during the last five days. Unless you convert to `datetime`, you only count the last five entries in the `DataFrame`, so you'll always get five for a team with more than five games played.

Source (Stackoverflow)