acdr acdr - 4 months ago 8
Python Question

View of a view of a numpy array is a copy?

If you change a view of a numpy array, the original array is also altered. This is intended behaviour.

arr = np.array([1,2,3])
mask = np.array([True, False, False])
arr[mask] = 0
arr
# Out: array([0, 2, 3])


However, if I take a view of such a view, and change that, then the original array is not altered:

arr = np.array([1,2,3])
mask_1 = np.array([True, False, False])
mask_1_arr = arr[mask_1] # Becomes: array([1])
mask_2 = np.array([True])
mask_1_arr[mask_2] = 0
arr
# Out: array([1, 2, 3])


This implies to me that, when you take a view of a view, you actually get back a copy. Is this correct? Why is this?

The same behaviour occurs if I use numpy arrays of numerical indices instead of a numpy array of boolean values. (E.g.
arr[np.array([0])][np.array([0])] = 0
doesn't change the first element of
arr
to 0.)

Answer

Selection by basic slicing always returns a view. Selection by advanced indexing always returns a copy. Selection by boolean mask is a form of advanced indexing. (The other form of advanced indexing is selection by integer array.)

However, assignment by advanced indexing affects the original array.

So

mask = np.array([True, False, False])
arr[mask] = 0

affects arr because it is an assignment. In contrast,

mask_1_arr = arr[mask_1]

is selection by boolean mask, so mask_1_arr is a copy of part of arr. Once you have a copy, the jig is up. When Python executes

mask_2 = np.array([True])
mask_1_arr[mask_2] = 0

the assignment affects mask_1_arr, but since mask_1_arr is a copy, it has no effect on arr.


|            | basic slicing    | advanced indexing |
|------------+------------------+-------------------|
| selection  | view             | copy              |
| assignment | affects original | affects original  |

Under the hood, arr[mask] = something causes Python to call arr.__setitem__(mask, something). The ndarray.__setitem__ method is implemented to modify arr. After all, that is the natural thing one should expect __setitem__ to do.

In contrast, as an expression arr[indexer] causes Python to call arr.__getitem__(indexer). When indexer is a slice, the regularity of the elements allows NumPy to return a view (by modifying the stride). When indexer is an arbitrary boolean mask or arbitrary array of integers, there is in general no regularity to the elements selected, so there is no way to return a view. Hence a copy must be returned.