mike van der naald mike van der naald - 6 months ago 127x
Python Question

Formatting the output from Pandas .groupby.size()

I'm working with pandas and I have a dataframe that looks something like this.

df = pd.DataFrame({'AAA' : [4,5,6,7], 'BBB' : [100,100,30,40],'CCC' : [100,100,30,-50]})

And I'm using .groupby() and .size() to find duplicate rows in only the 'BBB' and 'CCC' columns and turning the result into a dataframe like this :


I find the format of this new dataframe duplicates hard to work with, even though it has all the data that I need inside of it. It looks like this when I look at it in the Variable explorer in Spyder:

Index num
(30,30) 1
(40,-50) 1
(100,100) 2

So the index contains the values of 'BBB' and 'CCC' that were repeated and num contains how many times they were repeated. I don't know how to access data from the index and parse it into the individual columns so the index is really the hardest thing to work with. I would really like it if instead the output looked like this

Index 'BBB' 'CCC' num
0 30 30 1
1 40 -50 1
2 100 100 2

Sorry if the formatting is bad I still haven't found how to post well on this site.


is that (reset_index()) what you want?

In [24]: df.groupby(['BBB','CCC']).size().to_frame('num').reset_index()
   BBB  CCC  num
0   30   30    1
1   40  -50    1
2  100  100    2