Dahlai - 6 months ago 35

Python Question

I am wondering why in numpy there are one dimensional array of dimension (length, 1) and also one dimensional array of dimension (length, ) w/o a second value.

I am running into this quite frequently, e.g. when using

`np.concatenate()`

`reshape`

`hstack`

`vstack`

I can't think of a reason why this behavior is desirable. Can someone explain?

It was suggested by on of the comments that my question is a possible duplicate. I am more interested in the underlying workings of Numpy and not that there is a distinction between 1d and 2d arrays which I think is the point of the mentioned thread.

Answer

The data of a `ndarray`

is stored as a 1d buffer - just a block of memory. The multidimensional nature of the array is produced by the `shape`

and `strides`

attributes, and the code that uses them.

The `numpy`

developers chose to allow for an arbitrary number of dimensions, so the shape and strides are represented as tuples of any length, including 0 and 1.

In contrast MATLAB was built around FORTRAN programs that were developed for matrix operations. In the early days everything in MATLAB was a 2d matrix. Around 2000 (v3.5) it was generalized to allow more than 2d, but never less. The `numpy`

`np.matrix`

still follows that old 2d MATLAB constraint.

If you come from a MATLAB world you are used to these 2 dimensions, and the distinction between a row vector and column vector. But in math and physics that isn't influenced by MATLAB, a vector is a 1d array. Python lists are inherently 1d, as are `c`

arrays. To get 2d you have to have lists of lists or arrays of pointers to arrays, with `x[1][2]`

style of indexing.

Look at the shape and strides of this array and its variants:

```
In [48]: x=np.arange(10)
In [49]: x.shape
Out[49]: (10,)
In [50]: x.strides
Out[50]: (4,)
In [51]: x1=x.reshape(10,1)
In [52]: x1.shape
Out[52]: (10, 1)
In [53]: x1.strides
Out[53]: (4, 4)
In [54]: x2=np.concatenate((x1,x1),axis=1)
In [55]: x2.shape
Out[55]: (10, 2)
In [56]: x2.strides
Out[56]: (8, 4)
```

MATLAB adds new dimensions at the end. It orders its values like a `order='F'`

array, and can readily change a (n,1) matrix to a (n,1,1,1). `numpy`

is default `order='C'`

, and readily expands an array dimension at the start. Understanding this is essential when taking advantage of broadcasting.

Thus `x1 + x`

is a (10,1)+(10,) => (10,1)+(1,10) => (10,10)

Because of broadcasting a `(n,)`

array is more like a `(1,n)`

one than a `(n,1)`

one. A 1d array is more like a row matrix than a column one.

```
In [64]: np.matrix(x)
Out[64]: matrix([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
In [65]: _.shape
Out[65]: (1, 10)
```

The point with `concatenate`

is that it requires matching dimensions. It does not use broadcasting to adjust dimensions. There are a bunch of `stack`

functions that ease this constraint, but they do so by adjusting the dimensions before using `concatenate`

. Look at their code (readable Python).

So a proficient numpy user needs to be comfortable with that generalized `shape`

tuple, including the empty `()`

(0d array), `(n,)`

1d, and up. For more advanced stuff understanding strides helps as well (look for example at the strides and shape of a transpose).