user1514188 user1514188 -4 years ago 141
Python Question

Python - how to correctly index numpy array with other numpy arrays, similarly to MATLAB

I'm trying to learn python after years of using MATLAB and this is something I'm really stuck with. I have an array, say 10 by 8. I want to find rows that have value 3 in the first column and take columns "2:" in that row. What I do is:

newArray = oldArray[np.asarray(np.where(oldArray[:,0] == 3)), 2:]


But that creates a 3-dimensional array with first dimension 1, instead of 2-dimensional array. I'm trying to achieve MATLAB equivalent of

newArray = oldArray(find(oldArray(:,1)==3),3:end);


Anyone have any thoughts on how to do that? Thank you!

Answer Source

Slice the first column and compare against 3 to give us a mask for selecting rows. After selecting rows by indexing into the first axis/rows of a 2D array of the input array, we need to select the columns (second axis of array). On your MATLAB code, you have 3:end, which would translate to 2: on NumPy. In MATLAB, you need to specify the end index, in NumPy you don't. So, it simplifies to 2:, as compared to 3:end on MATLAB.

Thus, the code would be -

oldArray[oldArray[:,0]==3,2:]

Sample run -

In [352]: a
Out[352]:    |===============>|
array([[1, 0, 4, 2, 0, 1, 3, 2],
       [1, 0, 0, 3, 2, 3, 4, 4],
       [1, 2, 1, 4, 4, 0, 4, 2],
       [0, 2, 0, 3, 2, 2, 1, 2],
       [1, 2, 3, 3, 1, 0, 0, 1],
       [3, 4, 2, 4, 2, 0, 3, 4],  <==
       [3, 1, 1, 0, 0, 1, 2, 0],  <==
       [2, 0, 4, 3, 1, 3, 1, 1],
       [4, 3, 1, 3, 1, 3, 4, 4],
       [2, 0, 2, 0, 3, 1, 1, 1]])

In [353]: a[a[:,0]==3,2:]
Out[353]: 
array([[2, 4, 2, 0, 3, 4],
       [1, 0, 0, 1, 2, 0]])

Reviewing your code -

Your code was -

In [359]: a[np.asarray(np.where(a[:,0] == 3)), 2:]
Out[359]: 
array([[[2, 4, 2, 0, 3, 4],
        [1, 0, 0, 1, 2, 0]]])

That works too, but creates a 3D array as listed in the question.

Dissecting into it -

In [361]: np.where(a[:,0] == 3)
Out[361]: (array([5, 6]),)

We see np.where is a tuple of arrays, which are the row and column indices. For a slice of 1D, you won't have both rows and columns, but just one array of indices.

In MATLAB, find gives you an array of indices, so there's less confusion -

>> a
a =
     3     4     3     3
     2     5     5     2
     2     2     2     3
     5     3     4     4
     4     3     4     2
     3     2     4     2
>> find(a(:,1)==3)
ans =
     1
     6

So, to get those indices, get the first array out of it -

In [362]: np.where(a[:,0] == 3)[0]
Out[362]: array([5, 6])

Use it to index into the first axis and then slice the column from 2 onwards -

In [363]: a[np.where(a[:,0] == 3)[0]]
Out[363]: 
array([[3, 4, 2, 4, 2, 0, 3, 4],
       [3, 1, 1, 0, 0, 1, 2, 0]])

In [364]: a[np.where(a[:,0] == 3)[0],2:]
Out[364]: 
array([[2, 4, 2, 0, 3, 4],
       [1, 0, 0, 1, 2, 0]])

That gives you the expected output.


Word of caution

One needs to be careful while indexing into axes with masks or integers.

In theory, the column-indexing there should be equivalent of indexing with [2,3,4,5,6,7] for a of 8 columns.

Let's try that -

In [370]: a[a[:,0]==3,[2,3,4,5,6,7]]
....
IndexError: shape mismatch: indexing arrays could ...
     not be broadcast together with shapes (2,) (6,) 

We are triggering broadcastable indexing there. The elements for indexing into the two axes are of different lengths and are not broadcastable.

Let's verify that. The array for indexing into rows -

In [374]: a[:,0]==3
Out[374]: array([False, False, False, False, False,  True,  True, False, False, False], dtype=bool)

Essentially that's an array of two elements, as there are two True elems -

In [375]: np.where(a[:,0]==3)[0]
Out[375]: array([5, 6])

The array for indexing into columns was [2,3,4,5,6,7], which was of length 6 and thus are not broadcastable against the row indices.

To get to our desired target of selecting row IDs : 5,6 and for each of those rows select column IDs 2,3,4,5,6,7, we could create open meshes with np._ix that are broadcastable, like so -

In [376]: np.ix_(a[:,0]==3, [2,3,4,5,6,7])
Out[376]: 
(array([[5],
        [6]]), array([[2, 3, 4, 5, 6, 7]]))

Finally, index into input array with those for the desired o/p -

In [377]: a[np.ix_(a[:,0]==3, [2,3,4,5,6,7])]
Out[377]: 
array([[2, 4, 2, 0, 3, 4],
       [1, 0, 0, 1, 2, 0]])
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download