Vince - 7 months ago 48

Python Question

I'm trying to get the distribution of grades for each rank for names in a list of data.

However, I can't figure out how to get the proportion/percentage of each grade count over its rank group. Here's an example:

`df.head()`

`name rank grade`

Bob 1 A

Bob 1 A

Bob 1 B

Bob 1 C

Bob 2 B

Bob 3 C

Joe 1 C

Joe 2 B

Joe 2 B

Joe 3 A

Joe 3 B

Joe 3 B

I use

`grade_count = df.groupby(['name', 'rank', 'grade']).['grade'].size())`

`name rank grade`

Bob 1 A 2

B 1

C 1

2 B 1

3 C 1

Joe 1 C 1

2 B 2

3 A 1

B 2

Now for each size calculated, I'd like to get its proportion to the (name,rank) group (i.e. what is the proportion of a grade within a rank, within a system)

`name rank grade`

Bob 1 A 2 0.5 (Bob @ rank 1 had 4 grades, and 50% of them are A's)

B 1 0.25

C 1 0.25

2 B 1 1

3 C 1 1

Joe 1 C 1 1

2 B 2 1

3 A 1 0.33

B 2 0.66

I've managed to get the totals of each rank group by using

`rank_totals = grade_count.groupby(level[0,1]).sum()`

`name rank`

Bob 1 4

2 1

3 1

Joe 1 1

2 2

3 3

How can I divide the numbers from

`grade_count`

`rank_totals`

Answer

Group your data by name and rank levels, and use `transform`

to get the total of your series and broadcast it to the entire Series. Use that series to divide the current one:

```
grade_count.groupby(level = [0,1]).transform(sum)
Out[19]:
name rank grade
Bob 1 A 4
B 4
C 4
2 B 1
3 C 1
Joe 1 C 1
2 B 2
3 A 3
B 3
dtype: int64
grade_count / grade_count.groupby(level = [0,1]).transform(sum)
Out[20]:
name rank grade
Bob 1 A 0.500000
B 0.250000
C 0.250000
2 B 1.000000
3 C 1.000000
Joe 1 C 1.000000
2 B 1.000000
3 A 0.333333
B 0.666667
```

Source (Stackoverflow)