L Xandor L Xandor - 3 months ago 11
Python Question

Pandas Python - get value counts by grouped values

I have a situation where multiple customer ids can be belong to the same account, and all records are grouped in to one of two groups:

CUSTOMER_ACCOUNT_ID CUSTOMER_ID GROUP
123 555 A
123 556 A
124 557 B
124 558 B
125 559 A


What I want to do is get the count of unique CUSTOMER_ACCOUNT_IDs belonging to each group. That is, I don't care how many customer_ids belong to an account, I just want to see how many accounts is in each group. I'm looking for this output

GROUP COUNT
A 2
B 1


That is, number of unique accounts by group. One way of thinking of it is that I want to collapse or remove the CUSTOMER_ID dimension, so I'm left with

CUSTOMER_ACCOUNT_ID GROUP
123 A
124 B
125 A


and then do a value count of GROUP, but I'm unsure of how to approach this. I did find an ugly way to do this, but I'm new to pandas from using R so I'm guessing there's a more straightforward way that I don't know yet...

Answer

You can use the nunique() method after the grouping:

df.groupby('GROUP')['CUSTOMER_ACCOUNT_ID'].nunique().reset_index()

# GROUP CUSTOMER_ACCOUNT_ID
# 0   A                   2
# 1   B                   1