muon muon - 2 months ago 40
Python Question

pandas reset_index after groupby.value_counts()

I am trying to groupby a column and compute value counts on another column.

import pandas as pd
dftest = pd.DataFrame({'A':[1,1,1,1,1,1,1,1,1,2,2,2,2,2],
'Amt':[20,20,20,30,30,30,30,40, 40,10, 10, 40,40,40]})

print(dftest)


dftest looks like

A Amt
0 1 20
1 1 20
2 1 20
3 1 30
4 1 30
5 1 30
6 1 30
7 1 40
8 1 40
9 2 10
10 2 10
11 2 40
12 2 40
13 2 40


perform grouping

grouper = dftest.groupby('A')
df_grouped = grouper['Amt'].value_counts()


which gives

A Amt
1 30 4
20 3
40 2
2 40 3
10 2
Name: Amt, dtype: int64


what I want is to keep top two rows of each group

Also, I was perplexed by an error when I tried to
reset_index


df_grouped.reset_index()


which gives following error


df_grouped.reset_index()
ValueError: cannot insert Amt, already exists

Answer

You need parameter name in reset_index, because Series name is same as name of one of levels of MultiIndex:

df_grouped.reset_index(name='count')

Another solution is rename Series name:

print (df_grouped.rename('count').reset_index())

   A  Amt  count
0  1   30      4
1  1   20      3
2  1   40      2
3  2   40      3
4  2   10      2

More common solution instead value_counts is aggregate size:

df_grouped1 =  dftest.groupby(['A','Amt']).size().rename('count').reset_index()

print (df_grouped1)
   A  Amt  count
0  1   20      3
1  1   30      4
2  1   40      2
3  2   10      2
4  2   40      3
Comments