tarastar42 tarastar42 - 7 months ago 365
Python Question

How to calculate a percentage using grouped columns in Pandas python?

Pandas newbie, hitting a simple problem that I can't figure out.

I have a data set of baby names in the US that looks like this:

orig data

I am trying to write a program where I can feed in a list of names and get back the % likelihood that the name is for a male or a female (the year is irrelevant for my purposes right now)

I got as far as writing the groupby and then adding the male and female name counts together.

groupby data

Now all I need is to calc the percentages based on this data. I think it is some kind of

transform
(right?) but I can't seem to write anything that works. I know just how I would do it in SQL, but I am really trying to figure out Pandas. Some pointers would be greatly appreciated!

Thanks!

Answer

If I understood correctly what you're looking for, I would first fill the missing values with zeros, i.e. n.fillna(0). Then calculate the percentages and assign the results to a new column. For female percentage:

n['%F'] = n[('Count', 'F')] / n['sum'] * 100