johnbaltis - 1 year ago 50

Python Question

This is how I acquire my N-D data (

`func`

`import numpy`

import xarray

import itertools

xs = numpy.linspace(0, 10, 100)

ys = numpy.linspace(0, 0.1, 20)

zs = numpy.linspace(0, 5, 200)

def func(x, y, z):

return x * y / z

vals = list(itertools.product(xs, ys, zs))

result = [func(x, y, z) for x, y, z in vals]

I have a feeling that what I do can be simplified. I would like to put this in a

`xarray.DataArray`

`arr = np.array(result).reshape(len(xs), len(ys), len(zs))`

da = xarray.DataArray(arr, coords=[('x', xs), ('y', ys), ('z', zs)])

This a simple example, but usually I work with ~10D data that I obtain by mapping a

`itertools.product`

My question: how can I do this without reshaping my data and by using

`vals`

`xs`

`ys`

`zs`

In a similar way to what you what do with:

`index = pandas.MultiIndex.from_tuples(vals, names=['x', 'y', 'z'])`

df = pandas.DataFrame(result, columns=['result'], index=index)

Answer Source

Experienced `numpy`

users tend to focus on removing iterative steps. Thus we've zoomed in on your `result`

calculation, and view the `reshape`

as something trivial. Hence the answers so far have focused on broadcasting and calculating your function.

But I'm beginning to suspect that what's really bothering you is that

```
reshape(len(xs), len(ys), len(zs))
```

could become unwieldy if you have 10 such dimensions, not just 3. It's not so much the calculation speed, but the effort required to type `len(..)`

10 times. Or may be it's that the code will look ugly.

Anyways here's a way of bypassing all that typing. The key is to collect the dimensional arrays in a list

```
In [495]: dims = [np.linspace(0,10,4), np.linspace(0,.1,3), np.linspace(0,5,5)]
In [496]: from itertools import product
In [497]: vals = list(product(*dims))
In [498]: len(vals)
Out[498]: 60
In [499]: result = [sum(ijk) for ijk in vals] # a simple func
```

Now just get the `len's`

with a simple list comprehension:

```
In [501]: arr=np.array(result).reshape([len(i) for i in dims])
In [502]: arr.shape
Out[502]: (4, 3, 5)
```

Another possibility is to put the `linspace`

parameters in lists right at the start.

```
In [504]: ldims=[4,3,5]
In [505]: ends=[10,.1,5]
In [506]: dims=[np.linspace(0,e,l) for e,l in zip(ends, ldims)]
In [507]: vals = list(product(*dims))
In [508]: result=[sum(ijk) for ijk in vals]
In [509]: arr=np.array(result).reshape(ldims)
```

`reshape`

itself is not an expensive operation. Usually it creates a view, which is one of the fastest things you can do with an array.

`@Divakar`

hinted at this kind of solution in his deleted answer, with `*np.meshgrid(*A)`

as alternative to your `product(xs,ys)`

.

By the way, my answer doesn't involve `xarray`

either - because I don't have that package installed. I'm assuming that you know what you are doing when passing `arr`

of that 3d shape to it, as opposed to the longer 1d array. Look at the tag numbers, 5k followers for `numpy`

, 23 for `xarray`

.

The `xarray`

`coords`

parameter could also be constructed from `dims`

(with an additional list of names).

If this answer isn't to your liking, I'd suggest closing the question, and starting a new one with just the `xarray`

tag. That way you won't attract the numerous `numpy`

flies.