user2333196 user2333196 - 5 months ago 18
Python Question

count of last n days per group

I have a

DataFrame
like this

df = pd.DataFrame({'Team':['CHI','IND','CHI','CHI','IND','CHI','CHI','IND'],
'Date':[datetime.date(2015,10,27),datetime.date(2015,10,28),datetime.date(2015,10,29),datetime.date(2015,10,30),datetime.date(2015,11,1),datetime.date(2015,11,2),datetime.date(2015,11,4),datetime.date(2015,11,4)]})


I can find the number of rest days between games using this.

df['TeamRest'] = df.groupby('Team')['Date'].diff() - datetime.timedelta(1)


I would like to also add a row to the
DataFrame
that keeps track of how many games each team has played in the last 5 days.

Answer

With Date converted to datetime so it can be used as DateTimeIndex, which will be important for the rolling_count with daily frequency

df.Date = pd.to_datetime(df.Date)

1) calculate the difference in days between games per team:

df['days_between'] = df.groupby('Team')['Date'].diff() - timedelta(days=1)

2) calculate the rolling count of games for the last 5 days per team:

df['game_count'] = 1
rolling_games_count = df.set_index('Date').groupby('Team').apply(lambda x: pd.rolling_count(x, window=5, freq='D')).reset_index()
df = df.drop('game_count', axis=1).merge(rolling_games_count, on=['Team', 'Date'], how='left')

to get:

        Date Team  days_between  game_count
0 2015-10-27  CHI           NaT           1
1 2015-10-28  IND           NaT           1
2 2015-10-29  CHI        1 days           2
3 2015-10-30  CHI        0 days           3
4 2015-11-01  IND        3 days           2
5 2015-11-02  CHI        2 days           3
6 2015-11-04  CHI        1 days           2
7 2015-11-04  IND        2 days           2

If you were to

df = pd.DataFrame({'Team':['CHI','IND','CHI','CHI','IND','CHI','CHI','IND'], 'Date': [date(2015,10,27),date(2015,10,28),date(2015,10,29),date(2015,10,30),date(2015,11,1),date(2015,11,2),date(2015,11,4),date(2015,12,10)]})
df['game'] = 1  # initialize a game to count.
df['nb_games'] = df.groupby('Team')['game'].apply(pd.rolling_count, 5)

you get the surprising result (one Date changed to one month later)

         Date Team  game  nb_games
0  2015-10-27  CHI     1         1
2  2015-10-29  CHI     1         2
3  2015-10-30  CHI     1         3
5  2015-11-02  CHI     1         4
6  2015-11-04  CHI     1         5
1  2015-10-28  IND     1         1
4  2015-11-01  IND     1         2
7  2015-12-10  IND     1         3

of nb_games=3 for a later date in December, when there were no games during the last five days. Unless you convert to datetime, you only count the last five entries in the DataFrame, so you'll always get five for a team with more than five games played.