Rtut Rtut - 5 months ago 30
Python Question

Pandas dataframe combine duplicate columns into one- separate data by comma

My dataframe has few duplicate column names. If a duplicate column name is found combine duplicate columns into one. I also want to retain duplicate columns data separated by comma. Can anyone please suggest a way to do this.

I have constructed an example below. In my actual dataframe column names are unknown.

Input DataFrame:

-------------------------------------
Index | Col1 | Col2 | Col3 | Col2 |
-------------------------------------

A | CA1 | CA2 | CA3 | CA5 |

B | CB1 | CB2 | CB3 | CB5 |

C | CC1 | CC2 | CC3 | CC5 |

D | CD1 | CD2 | CD3 | CD5 |

E | CE1 | CE2 | CE3 | CE5 |

-------------------------------------


Output DataFrame:

------------------------------
Index | Col1 | Col2 | Col3 |
------------------------------

A | CA1 | CA2,CA5 | CA3 |

B | CB1 | CB2,CB5 | CB3 |

C | CC1 | CC2,CC5 | CC3 |

D | CD1 | CD2,CD5 | CD3 |

E | CE1 | CE2,CE5 | CE3 |

---------------------------

Answer

You could also:

df.groupby(df.columns, axis=1).agg(lambda x: ','.join(x)))

      Col1     Col2 Col3
Index                   
A      CA1  CA2,CA5  CA3
B      CB1  CB2,CB5  CB3
C      CC1  CC2,CC5  CC3
D      CD1  CD2,CD5  CD3
E      CE1  CE2,CE5  CE3