Dubraven93 Dubraven93 - 3 months ago 25
Python Question

Replace a column in Pandas dataframe with another that has same index but in a different order

I'm trying to re-insert back into a pandas dataframe a column that I extracted and of which I changed the order by sorting it.

Very simply, I have extracted a column from a pandas df:

col1 = df.col1


This column contains integers and I used the .sort() method to order it from smallest to largest. And did some operation on the data.

col1.sort()
#do stuff that changes the values of col1.


Now the indexes of col1 are the same as the indexes of the overall df, but in a different order.

I was wondering how I can insert the column back into the original dataframe (replacing the col1 that is there at the moment)

I have tried both of the following methods:

1)

df.col1 = col1


2)

df.insert(column_index_of_col1, "col1", col1)


but both methods give me the following error:

ValueError: cannot reindex from a duplicate axis


Any help will be greatly appreciated.
Thank you.

Answer

Consider this DataFrame:

df = pd.DataFrame({'A': [1, 2, 3], 'B': [6, 5, 4]}, index=[0, 0, 1])

df
Out: 
   A  B
0  1  6
0  2  5
1  3  4

Assign the second column to b and sort it and take the square, for example:

b = df['B']
b = b.sort_values()
b = b**2

Now b is:

b
Out: 
1    16
0    25
0    36
Name: B, dtype: int64

Without knowing the exact operation you've done on the column, there is no way to know whether 25 corresponds to the first row in the original DataFrame or the second one. You can take the inverse of the operation (take the square root and match, for example) but that would be unnecessary I think. If you start with an index that has unique elements (df = df.reset_index()) it would be much easier. In that case,

df['B'] = b

should work just fine.

Comments