LUSAQX LUSAQX - 2 months ago 12
Python Question

What effect did DataFrame.loc() cause on the data frame?

I used numpy.random.permutation() to generate random order to an original data frame X and want to assign whole X to X_perm by the random order.

X_perm=X
y_perm=y
perm = np.random.permutation(X.shape[0])
for i in range(len(perm)):
X_perm.loc[i]=(X.loc[perm[i]])
y_perm.loc[i]=(y.loc[perm[i]])


Just found that after running the code, the first record of X given by X[0:1] changed comparing to the case before running.

Strange. I didn't make any operation on X but assign its values to a new data frame. How did it cause the alteration of X value?
Cheers

Answer

The reason for this unexpected behavior is that X_perm is not an array that is independent of X. X_perm is a reference to X. So modifications to X_perm are also modifications made to X.

To demonstrate this:

import numpy as np
a = np.arange(16)
print a
b = a  # as your X_perm = X
print b  # same as print a above
b[0] = -999
print a  # has been modified
print b  # has been modified

a[-1] = -999
print a  # has been modified
print b  # has been modified

# using copy
a = np.arange(16)
print a
b = a.copy()  # b is separate reference to array
print b  # same as print a above
b[0] = -999
print a  # has NOT been modified
print b  # has been modified

a[-1] = -999
print a  # has been modified
print b  # has NOT been modified

To do what you want, you need to X_perm to be a copy of X.

X_perm = X.copy()

See also this relevant numpy doc on copy