johnbaltis johnbaltis - 1 year ago 50
Python Question

Take a 1D list of results and convert it to a N-D xarray.DataArray

This is how I acquire my N-D data (

is IRL not vectorizable):

import numpy
import xarray
import itertools

xs = numpy.linspace(0, 10, 100)
ys = numpy.linspace(0, 0.1, 20)
zs = numpy.linspace(0, 5, 200)

def func(x, y, z):
return x * y / z

vals = list(itertools.product(xs, ys, zs))
result = [func(x, y, z) for x, y, z in vals]

I have a feeling that what I do can be simplified. I would like to put this in a
without reshaping the data. However, this is how I do it now:

arr = np.array(result).reshape(len(xs), len(ys), len(zs))
da = xarray.DataArray(arr, coords=[('x', xs), ('y', ys), ('z', zs)])

This a simple example, but usually I work with ~10D data that I obtain by mapping a
(in parallel).

My question: how can I do this without reshaping my data and by using
and without taking the lengths of
, and

In a similar way to what you what do with:

index = pandas.MultiIndex.from_tuples(vals, names=['x', 'y', 'z'])
df = pandas.DataFrame(result, columns=['result'], index=index)

Answer Source

Experienced numpy users tend to focus on removing iterative steps. Thus we've zoomed in on your result calculation, and view the reshape as something trivial. Hence the answers so far have focused on broadcasting and calculating your function.

But I'm beginning to suspect that what's really bothering you is that

reshape(len(xs), len(ys), len(zs))

could become unwieldy if you have 10 such dimensions, not just 3. It's not so much the calculation speed, but the effort required to type len(..) 10 times. Or may be it's that the code will look ugly.

Anyways here's a way of bypassing all that typing. The key is to collect the dimensional arrays in a list

In [495]: dims = [np.linspace(0,10,4), np.linspace(0,.1,3), np.linspace(0,5,5)]
In [496]: from itertools import product
In [497]: vals = list(product(*dims))
In [498]: len(vals)
Out[498]: 60
In [499]: result = [sum(ijk) for ijk in vals] # a simple func

Now just get the len's with a simple list comprehension:

In [501]: arr=np.array(result).reshape([len(i) for i in dims])
In [502]: arr.shape
Out[502]: (4, 3, 5)

Another possibility is to put the linspace parameters in lists right at the start.

In [504]: ldims=[4,3,5]
In [505]: ends=[10,.1,5]
In [506]: dims=[np.linspace(0,e,l) for e,l in zip(ends, ldims)]
In [507]: vals = list(product(*dims))
In [508]: result=[sum(ijk) for ijk in vals]
In [509]: arr=np.array(result).reshape(ldims)

reshape itself is not an expensive operation. Usually it creates a view, which is one of the fastest things you can do with an array.

@Divakar hinted at this kind of solution in his deleted answer, with *np.meshgrid(*A) as alternative to your product(xs,ys).

By the way, my answer doesn't involve xarray either - because I don't have that package installed. I'm assuming that you know what you are doing when passing arr of that 3d shape to it, as opposed to the longer 1d array. Look at the tag numbers, 5k followers for numpy, 23 for xarray.

The xarray coords parameter could also be constructed from dims (with an additional list of names).

If this answer isn't to your liking, I'd suggest closing the question, and starting a new one with just the xarray tag. That way you won't attract the numerous numpy flies.