Lee Lee - 4 months ago 39
Python Question

What is the output of np.asarray(scalar)?

For a long time, I always use

np.array
,
np.asarray
and
np.asanyarray
to convert array_like list to array.

But when converting a scalar to numpy array, I know
np.atleast_1d(123)
gives rise to the right thing,
array([123])
.

But I'm confused about the output of
np.array
and
np.asarray


i = 123
x = np.array(i, dtype=np.int)
print x # array(123)
print x.shape # ()
print x.size # 0


Since
x.shape
indicates
x
is empty, what is
array(123)
? It's a 0-dimension array still contains
123
in its
__str__
.

A real empty array of
size=0
should be
array([])
,

print np.array([]).nbytes # 0
print np.array(123).nbytes # 8
print type(np.array(123)) # numpy.ndarray


Apparently they are different, though the size of them is both
0
.

Answer

I see this 0d case as a natural continuation of nd. MATLAB makes 2d the lower bound. numpy could have used 1d, but instead chose 0d.

An array consists of a data buffer, whether the value bytes are stored, a dtype (how to interpret those bytes), and shape (plus strides). shape is (displayed as) a tuple. Python allows tuples to have 0, 1, 2 or more elements, so why shouldn't shape have the same flexibility?

Look at what atleast_1d does

res = []
for ary in arys:
    ary = asanyarray(ary)
    if len(ary.shape) == 0:
        result = ary.reshape(1)
    else:
        result = ary
    res.append(result)
if len(res) == 1:
    return res[0]
else:
    return res

It can work with a list of inputs (scalar, array,list etc)

In [374]: np.atleast_1d(np.array(1),np.array([1]),np.array([[1]]))
Out[374]: [array([1]), array([1]), array([[1]])]

It converts each to array (as needed) and then checks the dim (len of shape). If 0d it reshapes it to (1,). This reshape does not change the data buffer. atleast_2d does result = ary.reshape(1, 1).

You could also ndmin:

In [382]: np.array(1,ndmin=1)
Out[382]: array([1])

np.array(1) is in many ways like np.int32(1). Both have () shape, both have methods like sum(). The only obvious difference is in their print format.

I don't know of any reason to purposefully construct a 0d array. It's just as easy to write np.array([1]) if I really want a 1d array. But you should know how to handle one if it comes up. That includes using .item() to extract the scalar value, and indexing with [()].

I've encountered it most often in SO questions about loading MATLAB files with scipy.io.loadmat. Some MATLAB constructs are returned as 0d object arrays.

Another way of thinking about a 0d array is that it adds (or retains) the whole suit of array methods to a scalar - without having to explicitly specify the dtype.

I mentioned the similarity to np.int32(1). I've seen it in beginner's code, but have not needed it myself.