Kemal Diri Kemal Diri - 7 months ago 88
Python Question

Concatenate Pandas Dataframes Rows side by side / top and bottom in same time

I've problem. I want to create a new dataframe from another one. I want to avoid duplicate rows. It mean if there is same mails, I should concatenate them side-by-side otherwise top and bottom. But the problem is I'm getting value indexing error every time.

pandas.indexes.base.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

And here is what I did :

if not
if data_frame_['Email'][0] in['Email'].get_values(): = pd.concat([, data_frame_], axis=1)
else: = pd.concat([,data_frame_], axis=0)
else: = data_frame_.copy()

end = time.time()

data_frame_ has only one row this is why I'm using


Exemple of data (which is in data_frame_ ):

Email Project1 Target1 Projetc2 Target2
------------------------------------------------------------- 1 5000 NaN NaN 7 5000 NaN NaN 7 4000 NaN NaN

What I desire is :

Email Project1 Target1 Projetc2 Target2
------------------------------------------------------------- 1 5000 7 4000 7 5000 NaN NaN

Ps : I could do it using dicts but to protect code integrity, I'd like to use dataframes.

Thank you in advance.


You can use pivot_table, but first create groups by cumcount:

#rename columns
df.rename(columns={'Project1':'Project','Target1':'Target'}, inplace=True)

print (df)
                 Email  Project  Target
0        1    5000
1        7    5000
2        7    4000

df['g'] = (df.groupby('Email').cumcount() + 1).astype(str)

df1 = df.pivot_table(index='Email', columns='g', values=['Project', 'Target'])
#Sort multiindex in columns 
df1 = df1.sort_index(axis=1, level=1)
#'reset' multiindex in columns
df1.columns = [''.join(col) for col in df1.columns]
print (df1)
                     Project1  Target1  Project2  Target2
Email                                                          7.0   5000.0       NaN      NaN       1.0   5000.0       7.0   4000.0