Kemal Diri Kemal Diri - 5 months ago 47
Python Question

How to use self join to reshape duplicate rows in Pandas?

I have a duplicates rows and I want to concatenate them. They are like :

ID Col1 Col2 Col3 ... Col46
-----------------------------------
id1 a1 b1 c1 ... x1
id2 a2 b2 c2 ... x2
id1 a1 b1 c1 ... y1
id3 a3 b3 c3 ... x3
id3 a3 b3 c3 ... y3
id3 a3 b3 c3 ... z3


and what I want to do is :

ID Col1 Col2 Col3 ... Col46 Col47 Col48
----------------------------------------------------
id1 a1 b1 c1 ... x1 y1 None
id2 a2 b2 c2 ... d2 None None
id3 a3 b3 c3 ... x3 y3 z3


for this, I'm using merge :

data_cliq = self.cliq.copy()
self.cliq = pd.merge(self.cliq, data_cliq, on = 'ID', how = 'inner')


but I think I need something more complex than this because it doesn't give me the result that I wish.

Answer

I think you need first create groups by cumcount and then use pivot_table:

df['g'] = df.groupby('ID')['Col46'].cumcount()

df = df.pivot_table(index=['ID','Col1','Col2','Col3'], 
                    columns='g', 
                    values='Col46', 
                    aggfunc=''.join).reset_index()

print (df)

g   ID Col1 Col2 Col3   0     1     2
0  id1   a1   b1   c1  x1    y1  None
1  id2   a2   b2   c2  x2  None  None
2  id3   a3   b3   c3  x3    y3    z3

And if need rename column names:

df['g'] = 'Col' + (df.groupby('ID')['Col46'].cumcount() + 46).astype(str)

df = df.pivot_table(index=['ID','Col1','Col2','Col3'], 
                    columns='g', 
                    values='Col46', 
                    aggfunc=''.join).reset_index()

print (df)
g   ID Col1 Col2 Col3 Col46 Col47 Col48
0  id1   a1   b1   c1    x1    y1  None
1  id2   a2   b2   c2    x2  None  None
2  id3   a3   b3   c3    x3    y3    z3