Dahlai - 1 year ago 185
Python Question

# numpy: Why is there a difference between (x,1) and (x, ) dimensionality

I am wondering why in numpy there are one dimensional array of dimension (length, 1) and also one dimensional array of dimension (length, ) w/o a second value.

I am running into this quite frequently, e.g. when using

`np.concatenate()`
which then requires a
`reshape`
step beforehand (or I could directly use
`hstack`
/
`vstack`
).

I can't think of a reason why this behavior is desirable. Can someone explain?

Edit:

It was suggested by on of the comments that my question is a possible duplicate. I am more interested in the underlying workings of Numpy and not that there is a distinction between 1d and 2d arrays which I think is the point of the mentioned thread.

The data of a `ndarray` is stored as a 1d buffer - just a block of memory. The multidimensional nature of the array is produced by the `shape` and `strides` attributes, and the code that uses them.

The `numpy` developers chose to allow for an arbitrary number of dimensions, so the shape and strides are represented as tuples of any length, including 0 and 1.

In contrast MATLAB was built around FORTRAN programs that were developed for matrix operations. In the early days everything in MATLAB was a 2d matrix. Around 2000 (v3.5) it was generalized to allow more than 2d, but never less. The `numpy` `np.matrix` still follows that old 2d MATLAB constraint.

If you come from a MATLAB world you are used to these 2 dimensions, and the distinction between a row vector and column vector. But in math and physics that isn't influenced by MATLAB, a vector is a 1d array. Python lists are inherently 1d, as are `c` arrays. To get 2d you have to have lists of lists or arrays of pointers to arrays, with `x[1][2]` style of indexing.

Look at the shape and strides of this array and its variants:

``````In [48]: x=np.arange(10)

In [49]: x.shape
Out[49]: (10,)

In [50]: x.strides
Out[50]: (4,)

In [51]: x1=x.reshape(10,1)

In [52]: x1.shape
Out[52]: (10, 1)

In [53]: x1.strides
Out[53]: (4, 4)

In [54]: x2=np.concatenate((x1,x1),axis=1)

In [55]: x2.shape
Out[55]: (10, 2)

In [56]: x2.strides
Out[56]: (8, 4)
``````

MATLAB adds new dimensions at the end. It orders its values like a `order='F'` array, and can readily change a (n,1) matrix to a (n,1,1,1). `numpy` is default `order='C'`, and readily expands an array dimension at the start. Understanding this is essential when taking advantage of broadcasting.

Thus `x1 + x` is a (10,1)+(10,) => (10,1)+(1,10) => (10,10)

Because of broadcasting a `(n,)` array is more like a `(1,n)` one than a `(n,1)` one. A 1d array is more like a row matrix than a column one.

``````In [64]: np.matrix(x)
Out[64]: matrix([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

In [65]: _.shape
Out[65]: (1, 10)
``````

The point with `concatenate` is that it requires matching dimensions. It does not use broadcasting to adjust dimensions. There are a bunch of `stack` functions that ease this constraint, but they do so by adjusting the dimensions before using `concatenate`. Look at their code (readable Python).

So a proficient numpy user needs to be comfortable with that generalized `shape` tuple, including the empty `()` (0d array), `(n,)` 1d, and up. For more advanced stuff understanding strides helps as well (look for example at the strides and shape of a transpose).

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download