Hema Abey Hema Abey -4 years ago 130
Python Question

How to apply function to panda dataframe with group by

I have this function i found on git hub.

def std_div(data, threshold=3):
std = data.std()
mean = data.mean()
isOutlier = []
for val in data:
if val/std > threshold:
isOutlier.append(True)
else:
isOutlier.append(False)
return isOutlier


I want to apply this to my dataFrame for each group(dept)

employee_id dept Salary
1 sales 10000
2 sales 110000
3 sales 120000
4 hr 5000
5 hr 6000


This works, but it calculates the std div for the entire data frame.

df["std_div"]= df.from_dict(std_div(df.Salary))

Answer Source

You could do something along the lines of the following, where you group by the column of interest then use a for loop to run the function on the column for that specific group

for name, group in df.groupby('dept'):
    df.loc[group.index, 'outlier'] = std_div(group.Salary)

df
employee_id dept    Salary  outlier
1           sales   10000   False
2           sales   110000  False
3           sales   120000  False
4           hr      5000    True
5           hr      6000    True

Depending on what you would like that output to be, you can assign the return values to the original dataframe

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download