Edamame Edamame - 7 days ago 6
Python Question

How to retrieve the aggregated value in a panda grouped data frame

I have a data frame

my_df
:

id color
--------------------
001 red
001 blue
001 yellow
002 green
002 black
003 yellow
003 white
003 blue


Then I did:

grouped_df = my_df.groupby('id')
a = grouped_df['id'].apply(lambda x: set(x.tolist()))


Then a looks like this:

id
--------------------------------
001 {red,blue,yellow}
002 {green,black}
003 {yellow,white,blue}


How do I loop over
a
, so I can find the corresponding set for each id? Thanks!

Answer

try applying set with a groupby

my_df.groupby('id').color.apply(set)

id
1      {blue, red, yellow}
2           {black, green}
3    {white, yellow, blue}
Name: color, dtype: object

explanation
The key difference between what you did and what I did was that I refined the grouping with .color then applied set. This ensures that I'm applying set on a series and not a dataframe.


if you assign the results of the groupby to a variable... say g

g = my_df.groupby('id').color.apply(set)

Then each group can be easily referenced by the index value

g.loc[1]

{'blue', 'red', 'yellow'}

you can loop like this

for i, v in g.iteritems():
    print(i, v)

1 {'blue', 'red', 'yellow'}
2 {'black', 'green'}
3 {'white', 'yellow', 'blue'}