pixiesweet44 pixiesweet44 - 28 days ago 16
Python Question

python - add indicator for 10 most recent dates

I'm using Python, and I have data with a team name and dates of games that have been played, it looks something like this (except there are a few hundred rows):

team date
0 TOR 2016/10/15
1 LAK 2016/10/20
2 CGY 2016/11/03
3 BUF 2016/10/30
4 PIT 2016/10/27
5 CHI 2016/11/05
6 VAN 2016/10/20
7 BUF 2016/10/16
8 STL 2016/10/13
9 BUF 2016/10/29
10 MIN 2016/10/29
11 PIT 2016/11/05
12 CHI 2016/10/18
13 BOS 2016/10/29
14 PIT 2016/10/20
15 COL 2016/10/20
16 MTL 2016/10/20
17 MTL 2016/11/05
18 BOS 2016/11/03
19 EDM 2016/11/05
20 NSH 2016/11/01


I would like to add indicator columns to show which are the most recent 10 games for each team, as well as the most recent 5 games for each team. With a 1 if they are in this group, and a 0 if they are not.

I'm stumped. Any ideas would be much appreciated!

Answer

I think you can use SeriesGroupBy.nsmallest with numpy.where for selecting indices by isin:

df.date = pd.to_datetime(df.date)
#in real data use nsmallest(10)
idx = df.groupby('team')['date'].nsmallest(2).index.get_level_values(1)
df['indicator'] = np.where(df.index.isin(idx), 1, 0)

print (df)
   team       date  indicator
0   TOR 2016-10-15          1
1   LAK 2016-10-20          1
2   CGY 2016-11-03          1
3   BUF 2016-10-30          0
4   PIT 2016-10-27          1
5   CHI 2016-11-05          1
6   VAN 2016-10-20          1
7   BUF 2016-10-16          1
8   STL 2016-10-13          1
9   BUF 2016-10-29          1
10  MIN 2016-10-29          1
11  PIT 2016-11-05          0
12  CHI 2016-10-18          1
13  BOS 2016-10-29          1
14  PIT 2016-10-20          1
15  COL 2016-10-20          1
16  MTL 2016-10-20          1
17  MTL 2016-11-05          1
18  BOS 2016-11-03          1
19  EDM 2016-11-05          1
20  NSH 2016-11-01          1
Comments