Stereo Stereo - 3 months ago 36
Python Question

Collapse duplicate rows with pandas

I have a dataframe that has duplicate column names. I want to collapse all the same entries into a single one.

The csv data of the data would be,


The result I am looking for is,


I want to sum over the columns.

I am new to pandas and can't seem to find how to aggregate the values correctly. Note that I have about >4000 columns.


You can use groupby by column names and aggregate sum:

print (df.groupby(level= 0, axis=1).sum())
   col1  col2   id
0     2     0  'a'
1     1     1  'b'
2     1     0  'c'