Vince - 1 year ago 147
Python Question

How to get percentage of counts of a column after groupby in Pandas

I'm trying to get the distribution of grades for each rank for names in a list of data.
However, I can't figure out how to get the proportion/percentage of each grade count over its rank group. Here's an example:

`df.head()`

``````name    rank    grade
Bob     1       A
Bob     1       A
Bob     1       B
Bob     1       C
Bob     2       B
Bob     3       C
Joe     1       C
Joe     2       B
Joe     2       B
Joe     3       A
Joe     3       B
Joe     3       B
``````

I use
`grade_count = df.groupby(['name', 'rank', 'grade']).['grade'].size())`
to give me the count of each grade within its (name,rank) group:

``````name    rank    grade
Bob     1       A     2
B     1
C     1
2       B     1
3       C     1
Joe     1       C     1
2       B     2
3       A     1
B     2
``````

Now for each size calculated, I'd like to get its proportion to the (name,rank) group (i.e. what is the proportion of a grade within a rank, within a system) This is the output I'd like:

``````name    rank    grade
Bob     1       A     2    0.5   (Bob @ rank 1 had 4 grades, and 50% of them are A's)
B     1    0.25
C     1    0.25
2       B     1    1
3       C     1    1
Joe     1       C     1    1
2       B     2    1
3       A     1    0.33
B     2    0.66
``````

I've managed to get the totals of each rank group by using
`rank_totals = grade_count.groupby(level[0,1]).sum()`
which results in:

``````name    rank
Bob     1       4
2       1
3       1
Joe     1       1
2       2
3       3
``````

How can I divide the numbers from
`grade_count`
by their corresponding rank totals in
`rank_totals`
?

Group your data by name and rank levels, and use `transform` to get the total of your series and broadcast it to the entire Series. Use that series to divide the current one:

``````grade_count.groupby(level = [0,1]).transform(sum)
Out[19]:
Bob   1     A        4
B        4
C        4
2     B        1
3     C        1
Joe   1     C        1
2     B        2
3     A        3
B        3
dtype: int64