off99555 off99555 - 5 months ago 7
Python Question

Shorter version of this numpy array indexing

I have the following code in python (numpy array or scipy.sparse.matrices), it works:

X[a,:][:,b]


But it doesn't look elegant. 'a' and 'b' are 1-D boolean mask.

'a' has the same length as X.shape[0] and 'b' has the same length as X.shape[1]

I tried
X[a,b]
but it doesn't work.

What I am trying to accomplish is to select particular rows and columns at the same time. For example, select row 0,7,8 then from that result select all rows from column 2,3,4

How would you make this shorter and more elegant?

Answer

You could use np.ix_ for such a broadcasted indexing, like so -

X[np.ix_(a,b)]

Though this won't be any shorter than the original code, but hopefully should be faster. This is because we are avoiding the intermediate output as with the original code that created X[a,:] with one slicing and then another slicing X[a,:][:,b] to give us the final output.

Also, this method would work for a and b as both int and boolean arrays.

Sample run

In [141]: X = np.random.randint(0,99,(6,5))

In [142]: m,n = X.shape

In [143]: a = np.in1d(np.arange(m),np.random.randint(0,m,(m)))

In [144]: b = np.in1d(np.arange(n),np.random.randint(0,n,(n)))

In [145]: X[a,:][:,b]
Out[145]: 
array([[17, 81, 64],
       [87, 16, 54],
       [98, 22, 11],
       [26, 54, 64]])

In [146]: X[np.ix_(a,b)]
Out[146]: 
array([[17, 81, 64],
       [87, 16, 54],
       [98, 22, 11],
       [26, 54, 64]])

Runtime test

In [147]: X = np.random.randint(0,99,(600,500))

In [148]: m,n = X.shape

In [149]: a = np.in1d(np.arange(m),np.random.randint(0,m,(m)))

In [150]: b = np.in1d(np.arange(n),np.random.randint(0,n,(n)))

In [151]: %timeit X[a,:][:,b]
1000 loops, best of 3: 1.74 ms per loop

In [152]: %timeit X[np.ix_(a,b)]
1000 loops, best of 3: 1.24 ms per loop
Comments