acdr - 9 months ago 32

Python Question

If you change a view of a numpy array, the original array is also altered. This is intended behaviour.

`arr = np.array([1,2,3])`

mask = np.array([True, False, False])

arr[mask] = 0

arr

# Out: array([0, 2, 3])

However, if I take a view of such a view, and change that, then the original array is

`arr = np.array([1,2,3])`

mask_1 = np.array([True, False, False])

mask_1_arr = arr[mask_1] # Becomes: array([1])

mask_2 = np.array([True])

mask_1_arr[mask_2] = 0

arr

# Out: array([1, 2, 3])

This implies to me that, when you take a view of a view, you actually get back a copy. Is this correct? Why is this?

The same behaviour occurs if I use numpy arrays of numerical indices instead of a numpy array of boolean values. (E.g.

`arr[np.array([0])][np.array([0])] = 0`

`arr`

Answer

Selection by basic slicing always returns a view. Selection by advanced indexing always returns a copy. Selection by boolean mask is a form of advanced indexing. (The other form of advanced indexing is selection by integer array.)

However, **assignment** by advanced indexing affects the original array.

So

```
mask = np.array([True, False, False])
arr[mask] = 0
```

affects `arr`

because it is an assignment. In contrast,

```
mask_1_arr = arr[mask_1]
```

is selection by boolean mask, so `mask_1_arr`

is a copy of part of `arr`

.
Once you have a copy, the jig is up. When Python executes

```
mask_2 = np.array([True])
mask_1_arr[mask_2] = 0
```

the assignment affects `mask_1_arr`

, but since `mask_1_arr`

is a copy,
it has no effect on `arr`

.

```
| | basic slicing | advanced indexing |
|------------+------------------+-------------------|
| selection | view | copy |
| assignment | affects original | affects original |
```

Under the hood, `arr[mask] = something`

causes Python to call
`arr.__setitem__(mask, something)`

. The `ndarray.__setitem__`

method is
implemented to modify `arr`

. After all, that is the natural thing one should expect
`__setitem__`

to do.

In contrast, as an expression `arr[indexer]`

causes Python to call
`arr.__getitem__(indexer)`

. When `indexer`

is a slice, the regularity of the
elements allows NumPy to return a view (by modifying the stride). When `indexer`

is an arbitrary boolean mask or arbitrary array of integers, there is in general
no regularity to the elements selected, so there is no way to return a
view. Hence a copy must be returned.