Edamame Edamame - 4 months ago 29
Python Question

How to retrieve the aggregated value in a panda grouped data frame

I have a data frame


id color
001 red
001 blue
001 yellow
002 green
002 black
003 yellow
003 white
003 blue

Then I did:

grouped_df = my_df.groupby('id')
a = grouped_df['id'].apply(lambda x: set(x.tolist()))

Then a looks like this:

001 {red,blue,yellow}
002 {green,black}
003 {yellow,white,blue}

How do I loop over
, so I can find the corresponding set for each id? Thanks!


try applying set with a groupby


1      {blue, red, yellow}
2           {black, green}
3    {white, yellow, blue}
Name: color, dtype: object

The key difference between what you did and what I did was that I refined the grouping with .color then applied set. This ensures that I'm applying set on a series and not a dataframe.

if you assign the results of the groupby to a variable... say g

g = my_df.groupby('id').color.apply(set)

Then each group can be easily referenced by the index value


{'blue', 'red', 'yellow'}

you can loop like this

for i, v in g.iteritems():
    print(i, v)

1 {'blue', 'red', 'yellow'}
2 {'black', 'green'}
3 {'white', 'yellow', 'blue'}