Vince Vince - 21 days ago 6
Python Question

How to get percentage of counts of a column after groupby in Pandas

I'm trying to get the distribution of grades for each rank for names in a list of data.
However, I can't figure out how to get the proportion/percentage of each grade count over its rank group. Here's an example:

df.head()


name rank grade
Bob 1 A
Bob 1 A
Bob 1 B
Bob 1 C
Bob 2 B
Bob 3 C
Joe 1 C
Joe 2 B
Joe 2 B
Joe 3 A
Joe 3 B
Joe 3 B


I use
grade_count = df.groupby(['name', 'rank', 'grade']).['grade'].size())
to give me the count of each grade within its (name,rank) group:

name rank grade
Bob 1 A 2
B 1
C 1
2 B 1
3 C 1
Joe 1 C 1
2 B 2
3 A 1
B 2


Now for each size calculated, I'd like to get its proportion to the (name,rank) group (i.e. what is the proportion of a grade within a rank, within a system) This is the output I'd like:

name rank grade
Bob 1 A 2 0.5 (Bob @ rank 1 had 4 grades, and 50% of them are A's)
B 1 0.25
C 1 0.25
2 B 1 1
3 C 1 1
Joe 1 C 1 1
2 B 2 1
3 A 1 0.33
B 2 0.66


I've managed to get the totals of each rank group by using
rank_totals = grade_count.groupby(level[0,1]).sum()
which results in:

name rank
Bob 1 4
2 1
3 1
Joe 1 1
2 2
3 3


How can I divide the numbers from
grade_count
by their corresponding rank totals in
rank_totals
?

Answer

Group your data by name and rank levels, and use transform to get the total of your series and broadcast it to the entire Series. Use that series to divide the current one:

grade_count.groupby(level = [0,1]).transform(sum)
Out[19]: 
name  rank  grade
Bob   1     A        4
            B        4
            C        4
      2     B        1
      3     C        1
Joe   1     C        1
      2     B        2
      3     A        3
            B        3
dtype: int64

grade_count / grade_count.groupby(level = [0,1]).transform(sum)
Out[20]: 
name  rank  grade
Bob   1     A        0.500000
            B        0.250000
            C        0.250000
      2     B        1.000000
      3     C        1.000000
Joe   1     C        1.000000
      2     B        1.000000
      3     A        0.333333
            B        0.666667