jhegedus - 6 months ago 24

Python Question

Inspired by Haskell:

How can I implement the following with a numpy array in Python?

`In [13]: [(x if x>3 else None) for x in range(10)]`

Out[13]: [None, None, None, None, 4, 5, 6, 7, 8, 9]

In other words, I am looking for a function for numpy that would have the signature:

`f:[a]->(a->Maybe a)->[Maybe a]`

`[a]`

I was trying this:

`np.apply_along_axis(lambda x:x if x>3 else None,0,np.arange(10))`

but it does not work:

`ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()`

Answer

NumPy's `where()`

will do the trick:

```
In [429]: import numpy as np
In [430]: arr = np.arange(10, dtype=np.object)
In [431]: np.where(arr > 3, arr, None)
Out[431]: array([None, None, None, None, 4, 5, 6, 7, 8, 9], dtype=object)
```

The code above creates a new array. If you wish to modify `arr`

**in place**, you could use boolean indexing `arr[arr < 4] = None`

(as pointed out by @Chris Mueller) or `putmask()`

:

```
In [432]: np.putmask(arr, arr < 4, None)
In [433]: arr
Out[433]: array([None, None, None, None, 4, 5, 6, 7, 8, 9], dtype=object)
```

Unless you are constrained to use `None`

as a "flag" value, I would suggest you to stick to @ev-br's recommendation and use `np.nan`

instead. I will follow that approach to assess performance:

```
In [434]: arr = np.arange(1000000, dtype=np.float)
In [435]: timeit np.where(arr > 3, arr, np.nan)
100 loops, best of 3: 3.61 ms per loop
In [436]: timeit arr[arr < 4] = np.nan
1000 loops, best of 3: 564 µs per loop
In [437]: timeit np.putmask(arr, arr < 4, np.nan)
1000 loops, best of 3: 1.08 ms per loop
```

Notice that I used a much larger array to further highlight efficiency differences. And the winner is... boolean indexing.