Alex Alex - 8 months ago 24
Python Question

Pandas Finding Multiple Hierarchy Averages

Let's say I have data like below in a Pandas dataframe:

enter image description here

I would like to find descriptive statistics (mean, median, standard dev) of:

  1. unique users per cohort

  2. comments per user per cohort

  3. comments per cohort

So for output, I'd expect to see:

  1. unique users per cohort -> [{a:3},{b:2},...] and then finding descriptive statistics for the series

  2. comments per user per cohort -> [{(a,alex):2},{(b,alex):0},{(a,beth):1},{(b,beth):3}...]

  3. comments per cohort -> [{a:5}, {b:6}...]

I'm using Pandas, and I'm absolutely stuck on how to do something so simple. I was thinking of using
, but that didn't yield a clear solution. I could do this without Pandas, but I thought these were the kinds of questions a Pandas dataframe was made for!?




You could use

df.groupby(['Cohort', 'User']).describe()



Per your desired output

df.groupby(['Cohort'])['User'].apply(lambda x: x.describe().ix['unique'])


df.groupby(['Cohort', 'User'])['Comment'].apply(lambda x: x.describe().ix['unique'])


df.groupby(['Cohort'])['Comment'].apply(lambda x: x.describe().ix['unique'])