Alexander McFarlane Alexander McFarlane - 4 months ago 13
Python Question

Apply a function to the 0-dimension of an ndarray

Problem




  • I have an
    ndarray
    , defined by
    arr
    that is an
    n
    -dimensional cube with length
    m
    in each dimension.

  • I want to act a function,
    func
    , by slicing along the dimension
    n=0
    and taking each
    n-1
    -dim slice as an input to the function.



This seems to work for
map()
but I can't find a
numpy
variant that is appropriate.
np.vectorise
seems to split the
n-1
-tensor into individual scalar entries. Neither
apply_along_axis
or
apply_over_axes
seem appropriate either.

My problem is such that I need to pass arbitrary functions as inputs so I do not see a solution with
einsum
being feasible either.

Question




  • Do you know the best
    numpy
    alternative to using
    np.asarray(map(func, arr))
    ?



Example



I define an example array,
arr
as a
4
-dim cube (or 4-tensor) by:

m, n = 3, 4
arr = np.arange(m**n).reshape((m,)*n)


I define an example function
f
,

def f(x):
"""makes it obvious how the np.ndarray is being passed into the function"""
try: # perform an op using x[0,0,0] which is expected to exist
i = x[0,0,0]
except:
print '\nno element x[0,0,0] in x: \n{}'.format(x)
return np.nan
return x-x+i


The expected result,
res
, from this function would remain the same shape but would satisfy the following:

print all([(res[i] == i*m**(n-1)).all() for i in range(m)])


This works with the default
map()
function,

res = np.asarray(map(f, a))
print all([(res[i] == i*m**(n-1)).all() for i in range(m)])
True


I would expect
np.vectorize
to work in the same way as
map()
but it acts in scalar entries:

res = np.vectorize(f)(a)

no element x[0,0,0] in x:
0
...

Answer

Given that arr is 4d, and your fn works on 3d arrays,

np.asarray(map(func, arr))

looks perfectly reasonable. I'd use the list comprehension form, but that's a matter of programming style

np.asarray([func(i) for i in arr])

for i in arr iterates on the first dimension of arr. In effect it treats arr as a list of the 3d arrays. And then it reassembles the resulting list into a 4d array.

np.vectorize doc could be more explicit about the function taking scalars. But yes, it passes values as scalars. Note that np.vectorize does not have provision for passing an iteration axis parameter. It's most useful when your function takes values from several array, something like

 [func(a,b) for a,b in zip(arrA, arrB)]

It generalizes the zip so allow for broadcasting. But otherwise it is an iterative solution. It knows nothing about the guts of your func, so it can't speed up its calls.

np.vectorize ends up calling np.frompyfunc, which being a bit less general is a bit faster. But it too passes scalars to the func.

np.apply_along/over_ax(e/i)s also iterate over one or more axes. You may find their code instructive, but I agree they don't apply here.

A variation on the map approach is to allocate the result array, and index:

In [45]: res=np.zeros_like(arr,int)
In [46]: for i in range(arr.shape[0]):
    ...:     res[i,...] = f(arr[i,...])

This may be easier if you need to iterate on a different axis than the 1st.

You need to do your own timings to see which is faster.

========================

An example of iteration over the 1st dimension with in-place modification:

In [58]: arr.__array_interface__['data']  # data buffer address
Out[58]: (152720784, False)

In [59]: for i,a in enumerate(arr):
    ...:     print(a.__array_interface__['data'])
    ...:     a[0,0,:]=i
    ...:     
(152720784, False)   # address of the views (same buffer)
(152720892, False)
(152721000, False)

In [60]: arr
Out[60]: 
array([[[[ 0,  0,  0],
         [ 3,  4,  5],
         [ 6,  7,  8]],

        ...

       [[[ 1,  1,  1],
         [30, 31, 32],
         ...

       [[[ 2,  2,  2],
         [57, 58, 59],
         [60, 61, 62]],
       ...]]])

When I iterate over an array, I get a view that starts at successive points on the common data buffer. If I modify the view, as above or even with a[:]=..., I modify the original. I don't have to write anything back. But don't use a = ...., which breaks the link to the original array.