user3471881 user3471881 - 2 months ago 14
Python Question

Pandas pivot_table to calculate share of margin

I have a

Pandas
DataFrame
named
df
that contains
n
columns
. One of the
columns
is named
COUNT
, which shows how many times values in
A
occurs.
A
contains unique identifiers so every row has the value
1
in the
column
COUNT
. It looks like this:

A B C D E COUNT
id1 cat1 1 a 15 1
id2 cat2 2 b 14 1
id3 cat2 2 c 14 1
id4 cat1 1 d 15 1
id5 cat3 2 e 14 1
.....


Now I want to transform my
df
to look like this:

14 15
cat1_tot NaN 2
cat1_share NaN 1
cat2_tot 2 NaN
cat2_share 0.6666 NaN
cat3_tot 1 NaN
cat3_share 0.3333 NaN
All 3 2


I can get
catx_tot
by using
pd.pivot_table


pd.pivot_table(
df,
values='COUNT',
index=['B'],
columns=['E'],
margins=True,
aggfunc=np.sum
)


But how do I add share to this?

Answer

combine groupby.size with groupby.transform

size = df.groupby(['B', 'E']).size()
sums = size.groupby(level='E').transform(np.sum)
aggd = pd.concat([size, size / sums], axis=1, keys=['total', 'share'])

aggd.unstack().stack(0)

enter image description here


to get the All row

all_ = aggd.groupby(level='E').sum().total.rename(('All', 'total'))
aggd.unstack().stack(0).append(all_)

enter image description here