Alex Alex - 6 months ago 22
Python Question

Pandas Finding Multiple Hierarchy Averages

Let's say I have data like below in a Pandas dataframe:

enter image description here

I would like to find descriptive statistics (mean, median, standard dev) of:


  1. unique users per cohort

  2. comments per user per cohort

  3. comments per cohort



So for output, I'd expect to see:


  1. unique users per cohort -> [{a:3},{b:2},...] and then finding descriptive statistics for the series

  2. comments per user per cohort -> [{(a,alex):2},{(b,alex):0},{(a,beth):1},{(b,beth):3}...]

  3. comments per cohort -> [{a:5}, {b:6}...]



I'm using Pandas, and I'm absolutely stuck on how to do something so simple. I was thinking of using
.groupby()
, but that didn't yield a clear solution. I could do this without Pandas, but I thought these were the kinds of questions a Pandas dataframe was made for!?

Thanks!

Answer

Solution

You could use

df.groupby(['Cohort', 'User']).describe()

or

df.groupby(['Cohort']).describe()

Per your desired output

df.groupby(['Cohort'])['User'].apply(lambda x: x.describe().ix['unique'])

and

df.groupby(['Cohort', 'User'])['Comment'].apply(lambda x: x.describe().ix['unique'])

and

df.groupby(['Cohort'])['Comment'].apply(lambda x: x.describe().ix['unique'])
Comments