UserYmY UserYmY - 1 month ago 23
Python Question

Python Pandas: Group by and count distinct value over all columns?

I have df

column1 column2 column3 column4 ..
name count count count


and I would like to Group by and count distinct value over all columnsI am trying :

df.groupby('column1').nunique()


but I am receiving this error.

AttributeError: 'DataFrameGroupBy' object has no attribute 'nunique'


Anybody have a suggestion?

Answer

You can use stack for Series and then Series.groupby with SeriesGroupBy.nunique:

df1 = df.set_index('column1').stack()

print (df1.groupby(level=[0,1]).nunique())

Sample:

df = pd.DataFrame({
'column3': {0: 2, 1: 2, 2: 1, 3: 2}, 
'column1': {0: 'name', 1: 'name', 2: 'name1', 3: 'name1'}, 
'column4': {0: 3, 1: 3, 2: 3, 3: 3}, 
'column2': {0: 1, 1: 1, 2: 7, 3: 5}})

print (df)
  column1  column2  column3  column4
0    name        1        2        3
1    name        1        2        3
2   name1        7        1        3
3   name1        5        2        3
df1 = df.set_index('column1').stack()
print (df1)
column1         
name     column2    1
         column3    2
         column4    3
         column2    1
         column3    2
         column4    3
name1    column2    7
         column3    1
         column4    3
         column2    5
         column3    2
         column4    3
dtype: int64

print (df1.groupby(level=[0,1]).nunique())
column1         
name     column2    1
         column3    1
         column4    1
name1    column2    2
         column3    2
         column4    1
dtype: int64

print (df1.groupby(level=[0,1]).nunique().unstack())
         column2  column3  column4
column1                           
name           1        1        1
name1          2        2        1

Another solution with double apply:

print (df.groupby('column1').apply(lambda x: x.iloc[:,1:].apply(lambda y: y.nunique())))
         column2  column3  column4
column1                           
name           1        1        1
name1          2        2        1