pixiesweet44 pixiesweet44 - 26 days ago 4
Python Question

python - make new column summing values from columnA with conditions in columnB

I have a pandas dataframe that looks like this:

team W L GF GA date home_ind last10
67 ARI 1 0 3 2 2016-11-01 1 1
99 ARI 1 0 2 2 2016-11-03 1 1
129 ARI 1 0 4 3 2016-10-15 1 1
171 ARI 1 0 5 4 2016-10-27 0 1
241 ARI 0 10 1 5 2016-11-04 0 0
316 ARI 0 10 3 5 2016-10-25 0 1
331 ARI 0 10 2 3 2016-10-21 0 1
334 ARI 0 10 2 3 2016-10-29 1 1
335 ARI 0 10 2 5 2016-10-20 0 1
340 ARI 0 10 4 7 2016-10-18 0 1
341 ARI 0 10 2 3 2016-10-23 0 1


I have this information for 30 different teams.

What I would like to do is make additional columns summing the values from one column, based on conditions from other columns.

For example, I would like a new column that adds the values from GF, but ONLY when home_ind = 1 AND last10 = 1 AND team = ARI. The value that results would be the same value for the column for each team. So for the example that I laid out, the results would look like this:

team W L GF GA date home_ind last10 GF_H_10
67 ARI 1 0 3 2 2016-11-01 1 1 11
99 ARI 1 0 2 2 2016-11-03 1 1 11
129 ARI 1 0 4 3 2016-10-15 1 1 11
171 ARI 1 0 5 4 2016-10-27 0 1 0
241 ARI 0 10 1 5 2016-11-04 0 0 0
316 ARI 0 10 3 5 2016-10-25 0 1 0
331 ARI 0 10 2 3 2016-10-21 0 1 0
334 ARI 0 10 2 3 2016-10-29 1 1 11
335 ARI 0 10 2 5 2016-10-20 0 1 0
340 ARI 0 10 4 7 2016-10-18 0 1 0
341 ARI 0 10 2 3 2016-10-23 0 1 0

Answer

The other solutions here are specific to the ARI team. This performs a groupby on the team, allowing the operation to be completed for the other 30 teams. I'm not sure which you're after.

Performing the groupby on team and then joining the results to the original dataframe is the main idea behind this solution. There is cleanup afterward on the eligibility criteria you define.

import pandas as pd

# sample data
df = pd.DataFrame({'team':['ARI']*11+['BWI']*4,
                   'W':[1]*4+[0]*7+[1,1,0,0],
                   'GF':[3,2,4,5,1,3,2,2,2,4,2,2,2,2,2],
                   'GA':[2,2,3,4,5,5,3,3,5,7,3,1,1,1,1],
                   'home_ind':[1,1,1,0,0,0,0,1,0,0,0,1,1,0,0],
                   'last10':[1]*4+[0]+[1]*6+[1,0,1,1]})

# define a mask
df2 = df.assign(elig=(df['home_ind'] == 1) & (df['last10'] == 1))


# group on team and join the results to the original dataframe
df2 = df2.join(df2[df2['elig']].groupby('team')['GF'].sum(), on='team', rsuffix='_H_10')

# clean up the result column
df2.loc[~df2['elig'], 'GF_H_10'] = 0

Given the dataframe

    GA  GF  W  home_ind  last10 team
0    2   3  1         1       1  ARI
1    2   2  1         1       1  ARI
2    3   4  1         1       1  ARI
3    4   5  1         0       1  ARI
4    5   1  0         0       0  ARI
5    5   3  0         0       1  ARI
6    3   2  0         0       1  ARI
7    3   2  0         1       1  ARI
8    5   2  0         0       1  ARI
9    7   4  0         0       1  ARI
10   3   2  0         0       1  ARI
11   1   2  1         1       1  BWI
12   1   2  1         1       0  BWI
13   1   2  0         0       1  BWI
14   1   2  0         0       1  BWI

The output is

    GA  GF  W  home_ind  last10 team   elig  GF_H_10
0    2   3  1         1       1  ARI   True       11
1    2   2  1         1       1  ARI   True       11
2    3   4  1         1       1  ARI   True       11
3    4   5  1         0       1  ARI  False        0
4    5   1  0         0       0  ARI  False        0
5    5   3  0         0       1  ARI  False        0
6    3   2  0         0       1  ARI  False        0
7    3   2  0         1       1  ARI   True       11
8    5   2  0         0       1  ARI  False        0
9    7   4  0         0       1  ARI  False        0
10   3   2  0         0       1  ARI  False        0
11   1   2  1         1       1  BWI   True        2
12   1   2  1         1       0  BWI  False        0
13   1   2  0         0       1  BWI  False        0
14   1   2  0         0       1  BWI  False        0
Comments