Stereo Stereo - 1 month ago 12
Python Question

Collapse duplicate rows with pandas

I have a dataframe that has duplicate column names. I want to collapse all the same entries into a single one.

The csv data of the data would be,

id,col1,col2,col1,col2
'a',1,0,1,0
'b',0,1,1,0
'c',1,0,0,0


The result I am looking for is,

id,col1,col2
'a',2,0
'b',1,1
'c',1,0


I want to sum over the columns.

I am new to pandas and can't seem to find how to aggregate the values correctly. Note that I have about >4000 columns.

Answer

You can use groupby by column names and aggregate sum:

print (df.groupby(level= 0, axis=1).sum())
   col1  col2   id
0     2     0  'a'
1     1     1  'b'
2     1     0  'c'