broncoAbierto broncoAbierto - 1 month ago 13
Python Question

Matrix indexing in Numpy

I was growing confused during the development of a small Python script involving matrix operations, so I fired up a shell to play around with a toy example and develop a better understanding of matrix indexing in Numpy.

This is what I did:

>>> import numpy as np
>>> A = np.matrix([1,2,3])
>>> A
matrix([[1, 2, 3]])
>>> A[0]
matrix([[1, 2, 3]])
>>> A[0][0]
matrix([[1, 2, 3]])
>>> A[0][0][0]
matrix([[1, 2, 3]])
>>> A[0][0][0][0]
matrix([[1, 2, 3]])


As you can imagine, this has not helped me develop a better understanding of matrix indexing in Numpy. This behavior would make sense for something that I would describe as "An array of itself", but I doubt anyone in their right mind would choose that as a model for matrices in a scientific library.

What is, then, the logic to the output I obtained? Why would the first element of a matrix object be itself?

PS: I know how to obtain the first entry of the matrix. What I am interested in is the logic behind this design decision.

EDIT: I'm not asking how to access a matrix element, or why a matrix row behaves like a matrix. I'm asking for a definition of the behavior of a matrix when indexed with a single number. It's an action typical of arrays, but the resulting behavior is nothing like the one you would expect from an array. I would like to know how this is implemented and what's the logic behind the design decision.

Answer Source

Look at the shape after indexing:

In [295]: A=np.matrix([1,2,3])
In [296]: A.shape
Out[296]: (1, 3)
In [297]: A[0]
Out[297]: matrix([[1, 2, 3]])
In [298]: A[0].shape
Out[298]: (1, 3)

The key to this behavior is that np.matrix is always 2d. So even if you select one row (A[0,:]), the result is still 2d, shape (1,3). So you can string along as many [0] as you like, and nothing new happens.

What are you trying to accomplish with A[0][0]? The same as A[0,0]? For the base np.ndarray class these are equivalent.

Note that Python interpreter translates indexing to __getitem__ calls.

 A.__getitem__(0).__getitem__(0)
 A.__getitem__((0,0))

[0][0] is 2 indexing operations, not one. So the effect of the second [0] depends on what the first produces.

For an array A[0,0] is equivalent to A[0,:][0]. But for a matrix, you need to do:

In [299]: A[0,:][:,0]
Out[299]: matrix([[1]])  # still 2d

=============================

"An array of itself", but I doubt anyone in their right mind would choose that as a model for matrices in a scientific library.

What is, then, the logic to the output I obtained? Why would the first element of a matrix object be itself?

In addition, A[0,:] is not the same as A[0]

In light of these comments let me suggest some clarifications.

A[0] does not mean 'return the 1st element'. It means select along the 1st axis. For a 1d array that means the 1st item. For a 2d array it means the 1st row. For ndarray that would be a 1d array, but for a matrix it is another matrix. So for a 2d array or matrix, A[i,:] is the same thing as A[i].

A[0] does not just return itself. It returns a new matrix. Different id:

In [303]: id(A)
Out[303]: 2994367932
In [304]: id(A[0])
Out[304]: 2994532108

It may have the same data, shape and strides, but it's a new object. It's just as unique as the ith row of a many row matrix.

Most of the unique matrix activity is defined in: numpy/matrixlib/defmatrix.py. I was going to suggest looking at the matrix.__getitem__ method, but most of the action is performed in np.ndarray.__getitem__.

np.matrix class was added to numpy as a convenience for old-school MATLAB programmers. numpy arrays can have almost any number of dimensions, 0, 1, .... MATLAB allowed only 2, though a release around 2000 generalized it to 2 or more.