user40314 user40314 - 2 months ago 23
Python Question

checking non-ambiguity of strides in numpy array

For a numpy array

X
, the location of its element
X[k[0], ..., k[d-1]]
is offset from the location of
X[0,..., 0]
by
k[0]*s[0] + ... + k[d-1]*s[d-1]
, where
(s[0],...,s[d-1])
is the tuple representing
X.strides
.

As far as I understand nothing in numpy array specs requires that distinct indexes of array
X
correspond to distinct addresses in memory, the simplest instance of this being a zero value of the stride, e.g. see advanced NumPy section of scipy lectures.

Does the numpy have a built-in predicate to test if the strides and the shape are such that distinct indexes map to distinct memory addresses?

If not, how does one write one, preferably so as to avoid sorting of the strides?

Answer

edit: It took me a bit to figure what you are asking about. With striding tricks it's possible to index the same element in a databuffer in different ways, and broadcasting actually does this under the covers. Normally we don't worry about it because it is either hidden or intentional.

Recreating in the strided mapping and looking for duplicates may be the only way to test this. I'm not aware of any existing function that checks it.

==================

I'm not quite sure what you concerned with. But let me illustrate how shape and strides work

Define a 3x4 array:

In [453]: X=np.arange(12).reshape(3,4)
In [454]: X.shape
Out[454]: (3, 4)
In [455]: X.strides
Out[455]: (16, 4)

Index an item

In [456]: X[1,2]
Out[456]: 6

I can get it's index in a flattened version of the array (e.g. the original arange) with ravel_multi_index:

In [457]: np.ravel_multi_index((1,2),X.shape)
Out[457]: 6

I can also get this location using strides - keeping mind that strides are in bytes (here 4 bytes per item)

In [458]: 1*16+2*4
Out[458]: 24
In [459]: (1*16+2*4)/4
Out[459]: 6.0

All these numbers are relative to the start of the data buffer. We can get the data buffer address from X.data or X.__array_interface__['data'], but usually don't need to.

So this strides tells us that to go from entry to the next, step 4 bytes, and to go from one row to the next step 16. 6 is located at one row down, 2 over, or 24 bytes into the buffer.

In the as_strided example of your link, strides=(1*2, 0) produces repeated indexing of specific values.

With my X:

In [460]: y=np.lib.stride_tricks.as_strided(X,strides=(16,0), shape=(3,4))
In [461]: y
Out[461]: 
array([[0, 0, 0, 0],
       [4, 4, 4, 4],
       [8, 8, 8, 8]])

y is a 3x4 that repeatedly indexes the 1st column of X.

Changing one item in y ends up changing one value in X but a whole row in y:

In [462]: y[1,2]=10
In [463]: y
Out[463]: 
array([[ 0,  0,  0,  0],
       [10, 10, 10, 10],
       [ 8,  8,  8,  8]])
In [464]: X
Out[464]: 
array([[ 0,  1,  2,  3],
       [10,  5,  6,  7],
       [ 8,  9, 10, 11]])

as_strided can produce some weird effects if you aren't careful.

OK, maybe I've figured out what's bothering you - can I identify a situation like this where two different indexing tuples end up pointing to the same location in the data buffer? Not that I'm aware of. That y strides contains a 0 is a pretty good indicator.

as_stridedis often used to create overlapping windows:

In [465]: y=np.lib.stride_tricks.as_strided(X,strides=(8,4), shape=(3,4))
In [466]: y
Out[466]: 
array([[ 0,  1,  2,  3],
       [ 2,  3, 10,  5],
       [10,  5,  6,  7]])
In [467]: y[1,2]=20
In [469]: y
Out[469]: 
array([[ 0,  1,  2,  3],
       [ 2,  3, 20,  5],
       [20,  5,  6,  7]])

Again changing 1 item in y ends up changing 2 values in y, but only 1 in X.

Ordinary array creation and indexing does not have this duplicate indexing issue. Broadcasting may do something like, under the cover, where a (4,) array is changed to (1,4) and then to (3,4), effectively replicating rows. I think there's another stride_tricks function that does this explicitly.

In [475]: x,y=np.lib.stride_tricks.broadcast_arrays(X,np.array([.1,.2,.3,.4]))
In [476]: x
Out[476]: 
array([[ 0,  1,  2,  3],
       [20,  5,  6,  7],
       [ 8,  9, 10, 11]])
In [477]: y
Out[477]: 
array([[ 0.1,  0.2,  0.3,  0.4],
       [ 0.1,  0.2,  0.3,  0.4],
       [ 0.1,  0.2,  0.3,  0.4]])
In [478]: y.strides
Out[478]: (0, 8)

In any case, in normal array use we don't have to worry about this ambiguity. We get it only with intentional actions, not accidental ones.

==============

How about this for a test:

def dupstrides(x):
    uniq={sum(s*j for s,j in zip(x.strides,i)) for i in np.ndindex(x.shape)}
    print(uniq)
    print(len(uniq))
    print(x.size)
    return len(uniq)<x.size

In [508]: dupstrides(X)
{0, 32, 4, 36, 8, 40, 12, 44, 16, 20, 24, 28}
12
12
Out[508]: False
In [509]: dupstrides(y)
{0, 4, 8, 12, 16, 20, 24, 28}
8
12
Out[509]: True