Jeff Jeff - 7 days ago 5x
Python Question

Pandas Grouping - Creating a Generic Aggregation Function

I need to do a lot of aggregation on data and I was hoping to write a function that would allow me to pass

1) The string to use for grouping
2) The fields that would constitute the numerator/denominator/ and formula

As I will be doing a lot of cuts on the data using different groupings and different numerators and denominators, it would be easier for me to create a generic group by and pass it what I need

So lets take the following example:

import pandas as pd
df=pd.read_csv("", sep=',')
(df.groupby(['sex', 'smoker'])[['total_bill','tip']].sum().apply(lambda r: r.tip/r.total_bill, axis = 1))

Now, I would want to create a function that would allow me to pass a group by value and a numerator denominator field

So, for example

groupbyvalue=['sex', 'smoker']

And plug them into something like

(df.groupby(groupbyvalue)[fieldstoaggregate].sum().apply(lambda r: r.tip/r.total_bill, axis = 1))

That works fine, but when I tried to replace the formula with something like:


And then placed it in the formula as follows

(df.groupby(groupbyvalue)[fieldstoaggregate].sum().apply(lambda r: dfformula, axis = 1)*10000)

My output looks as follows:

sex smoker
Female No r.tip/r.total_billr.tip/r.total_billr.tip/
Yes r.tip/r.total_billr.tip/r.total_billr.tip/
Male No r.tip/r.total_billr.tip/r.total_billr.tip/
Yes r.tip/r.total_billr.tip/r.total_billr.tip/
dtype: object

Is there any way to create the calculation dynamically then use it in the formula rather than having it interpreted as a string?



You can achieve this using eval() function

import pandas as pd

df = pd.read_csv("", sep=',')

groupbyvalue = ['sex', 'smoker']
fieldstoaggregate = ['tip','total_bill']
dfformula = "r.tip/r.total_bill"

(df.groupby(groupbyvalue)[fieldstoaggregate].sum().apply(lambda r: eval(dfformula), axis = 1))

The output would be as follows

sex     smoker
Female  No        0.153189
        Yes       0.163062
Male    No        0.157312
        Yes       0.136919
dtype: float64