johnbaltis - 24 days ago 10
Python Question

# Take a 1D list of results and convert it to a N-D xarray.DataArray

This is how I acquire my N-D data (

`func`
is IRL not vectorizable):

``````import numpy
import xarray
import itertools

xs = numpy.linspace(0, 10, 100)
ys = numpy.linspace(0, 0.1, 20)
zs = numpy.linspace(0, 5, 200)

def func(x, y, z):
return x * y / z

vals = list(itertools.product(xs, ys, zs))
result = [func(x, y, z) for x, y, z in vals]
``````

I have a feeling that what I do can be simplified. I would like to put this in a
`xarray.DataArray`
without reshaping the data. However, this is how I do it now:

``````arr = np.array(result).reshape(len(xs), len(ys), len(zs))
da = xarray.DataArray(arr, coords=[('x', xs), ('y', ys), ('z', zs)])
``````

This a simple example, but usually I work with ~10D data that I obtain by mapping a
`itertools.product`
(in parallel).

My question: how can I do this without reshaping my data and by using
`vals`
and without taking the lengths of
`xs`
,
`ys`
, and
`zs`
?

In a similar way to what you what do with:

``````index = pandas.MultiIndex.from_tuples(vals, names=['x', 'y', 'z'])
df = pandas.DataFrame(result, columns=['result'], index=index)
``````

Experienced `numpy` users tend to focus on removing iterative steps. Thus we've zoomed in on your `result` calculation, and view the `reshape` as something trivial. Hence the answers so far have focused on broadcasting and calculating your function.

But I'm beginning to suspect that what's really bothering you is that

``````reshape(len(xs), len(ys), len(zs))
``````

could become unwieldy if you have 10 such dimensions, not just 3. It's not so much the calculation speed, but the effort required to type `len(..)` 10 times. Or may be it's that the code will look ugly.

Anyways here's a way of bypassing all that typing. The key is to collect the dimensional arrays in a list

``````In [495]: dims = [np.linspace(0,10,4), np.linspace(0,.1,3), np.linspace(0,5,5)]
In [496]: from itertools import product
In [497]: vals = list(product(*dims))
In [498]: len(vals)
Out[498]: 60
In [499]: result = [sum(ijk) for ijk in vals] # a simple func
``````

Now just get the `len's` with a simple list comprehension:

``````In [501]: arr=np.array(result).reshape([len(i) for i in dims])
In [502]: arr.shape
Out[502]: (4, 3, 5)
``````

Another possibility is to put the `linspace` parameters in lists right at the start.

``````In [504]: ldims=[4,3,5]
In [505]: ends=[10,.1,5]
In [506]: dims=[np.linspace(0,e,l) for e,l in zip(ends, ldims)]
In [507]: vals = list(product(*dims))
In [508]: result=[sum(ijk) for ijk in vals]
In [509]: arr=np.array(result).reshape(ldims)
``````

`reshape` itself is not an expensive operation. Usually it creates a view, which is one of the fastest things you can do with an array.

`@Divakar` hinted at this kind of solution in his deleted answer, with `*np.meshgrid(*A)` as alternative to your `product(xs,ys)`.

By the way, my answer doesn't involve `xarray` either - because I don't have that package installed. I'm assuming that you know what you are doing when passing `arr` of that 3d shape to it, as opposed to the longer 1d array. Look at the tag numbers, 5k followers for `numpy`, 23 for `xarray`.

The `xarray` `coords` parameter could also be constructed from `dims` (with an additional list of names).

If this answer isn't to your liking, I'd suggest closing the question, and starting a new one with just the `xarray` tag. That way you won't attract the numerous `numpy` flies.