Maximilian Maximilian - 3 months ago 23
Python Question

Concise way to filter data in xarray

I need to apply a very simple 'match statement' to the values in an xarray array:


  1. Where the value > 0, make 2

  2. Where the value == 0, make 0

  3. Where the value is
    NaN
    , make
    NaN



Here's my current solution. I'm using
NaN
s,
.fillna
, & type coercion in lieu of 2d indexing.

valid = date_by_items.notnull()
positive = date_by_items > 0
positive = positive * 2
result = positive.fillna(0.).where(valid)
result


This changes this:

In [20]: date_by_items = xr.DataArray(np.asarray((list(range(3)) * 10)).reshape(6,5), dims=('date','item'))
...: date_by_items
...:
Out[20]:
<xarray.DataArray (date: 6, item: 5)>
array([[0, 1, 2, 0, 1],
[2, 0, 1, 2, 0],
[1, 2, 0, 1, 2],
[0, 1, 2, 0, 1],
[2, 0, 1, 2, 0],
[1, 2, 0, 1, 2]])
Coordinates:
* date (date) int64 0 1 2 3 4 5
* item (item) int64 0 1 2 3 4


... to this:

Out[22]:
<xarray.DataArray (date: 6, item: 5)>
array([[ 0., 2., 2., 0., 2.],
[ 2., 0., 2., 2., 0.],
[ 2., 2., 0., 2., 2.],
[ 0., 2., 2., 0., 2.],
[ 2., 0., 2., 2., 0.],
[ 2., 2., 0., 2., 2.]])
Coordinates:
* date (date) int64 0 1 2 3 4 5
* item (item) int64 0 1 2 3 4


While in pandas
df[df>0] = 2
would be enough. Surely I'm doing something pedestrian and there's an terser way?

Answer

If you are happy to load your data in-memory as a NumPy array, you can modify the DataArray values in place with NumPy:

date_by_items.values[date_by_items.values > 0] = 2

The cleanest way to handle this would be if xarray supported the other argument to where, but we haven't implemented that yet (hopefully soon -- the groundwork has been laid!). When that works, you'll be able to write date_by_items.where(date_by_items > 0, 2).

Either way, you'll need to do this twice to apply both your criteria.

Comments