S Ringne S Ringne - 3 years ago 181
Python Question

Summarize dataframe by grouping on a column with pandas

i have a dataframe

id store val1 val2
1 abc 20 30
1 abc 20 40
1 qwe 78 45
2 dfd 34 45
2 sad 43 45


from this i have to group by on
id
and create a new df, with column,
total_store
and
unique stores
and
non-repeating_stores
, which contains count of such store occurances.
my final output should be

id total_store unique stores non-repeating_stores
1 3 2 1
2 2 2 2


i can get total stores by doing

df.groupby('id')['store'].count()


But how do i get others and form a dataframe out of it

Answer Source

You can use a groupby + agg.

df = df.groupby('id').store.agg(['count', 'nunique', \
                lambda x: x.drop_duplicates(keep=False).size])
df.columns = ['total_store', 'unique stores', 'non-repeating_stores']

df    
    total_store  unique stores  non-repeating_stores
id                                                  
1             3              2                     1
2             2              2                     2

For older pandas versions, passing a dict allows simplifying your code (deprecated in 0.20 and onwards):

agg_funcs = {'total_stores' : 'count', 'unique_stores' : 'nunique', 
         'non-repeating_stores' : lambda x: x.drop_duplicates(keep=False).size
}
df = df.groupby('id').store.agg(agg_funcs)

df 
    total_stores  non-repeating_stores  unique_stores
id                                                   
1              3                     1              2
2              2                     2              2

As a slight improvement with speed, you can employ the use of drop_duplicates' sister method, duplicated, in this fashion, as documented by jezrael:

lambda x: (~x.duplicated(keep=False)).sum()

This would replace the third function in agg, with a 20% speed boost over large data of size 1000000:

1 loop, best of 3: 7.31 s per loop 

v/s

1 loop, best of 3: 5.19 s per loop 
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download