Ryan Erwin Ryan Erwin - 1 year ago 66
Python Question

Assign new value in DataFrame column based on group max

I have a data frame and I'd like to add a new column based the value in two columns, where one uses the group max. For example, here's an example


import pandas as pd
import numpy as np

df = pd.DataFrame({"col1": ['a'] * 2 + ['b'] * 4,
"col2": [1, 2, 1, 2, 3, 4],
"col3": ['yep', 'no', 'no', 'no', 'yep', 'yep']})

col1 col2 col3
0 a 1 yep
1 a 2 no
2 b 1 no
3 b 2 no
4 b 3 yep
5 b 4 yep

Now, what I'd like is a new column that said "yes" where
is the group max, grouping on
is equal to "yep".

col1 col2 col3 col4
0 a 1 yep no
1 a 2 no no
2 b 1 no no
3 b 2 no no
4 b 3 yep no
5 b 4 yep yes

I've tried the code below, but I'm getting a slight error on the return value:

# define function
def max_bal(df):
max_row = df['col2'].max()
label = np.where((df['col3'] == 'yep') &
(df['col2'] == max_row),
return label

Now call function in a group by statement


a [No, No]
b [No, No, No, Yes]
dtype: object

These are the correct values but not returned in a
as I would expect. I'd need one
because I'm trying to assign the these values to a new column in the
above (see `col4'). Any pointers would be greatly appreciated.

Answer Source

You can return labels as a DataFrame with the same index as the group the function is being applied on.

def max_bal(df):
    max_row = df['col2'].max()
    labels = np.where((df['col3'] == 'yep') & 
                     (df['col2'] == max_row),
    return pd.DataFrame(labels, index=df.index)

df['col4'] = df.groupby('col1').apply(max_bal)