Lamakaha Lamakaha - 1 month ago 15
Python Question

Panda's DataFrame - renaming multiple identically named columns

i have several columns named the same in a df. Need to rename them. The usual rename renames the all
anyway I can rename the below blah(s) to blah1, blah4, blah5?

In [6]:

df=pd.DataFrame(np.arange(2*5).reshape(2,5))
df.columns=['blah','blah2','blah3','blah','blah']
df
Out[6]:


blah blah2 blah3 blah blah
0 0 1 2 3 4
1 5 6 7 8 9


In [7]:

df.rename(columns = {'blah':'blah1'})
Out[7]:
blah1 blah2 blah3 blah1 blah1
0 0 1 2 3 4
1 5 6 7 8 9

Answer

I was looking to find a solution within Pandas more than a general Python solution. Column's get_loc() function returns a masked array if it finds duplicates with 'True' values pointing to the locations where duplicates are found. I then use the mask to assign new values into those locations. In my case, I know ahead of time how many dups I'm going to get and what I'm going to assign to them but it looks like df.columns.get_duplicates() would return a list of all dups and you can then use that list in conjunction with get_loc() if you need a more generic dup-weeding action

cols=pd.Series(df.columns)
for dup in df.columns.get_duplicates(): cols[df.columns.get_loc(dup)]=[dup+'.'+str(d_idx) if d_idx!=0 else dup for d_idx in range(df.columns.get_loc(dup).sum())]
df.columns=cols

    blah    blah2   blah3   blah.1  blah.2
 0     0        1       2        3       4
 1     5        6       7        8       9
Comments