jhegedus - 1 year ago 117
Python Question

# map with failure in Numpy

How can I implement the following with a numpy array in Python?

``````In [13]: [(x if x>3 else None) for x in range(10)]
Out[13]: [None, None, None, None, 4, 5, 6, 7, 8, 9]
``````

In other words, I am looking for a function for numpy that would have the signature:
`f:[a]->(a->Maybe a)->[Maybe a]`
`[a]`
would be a numpy list.

I was trying this:

``````np.apply_along_axis(lambda x:x if x>3 else None,0,np.arange(10))
``````

but it does not work:

``````ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
``````

NumPy's `where()` will do the trick:

``````In [429]: import numpy as np

In [430]: arr = np.arange(10, dtype=np.object)

In [431]: np.where(arr > 3, arr, None)
Out[431]: array([None, None, None, None, 4, 5, 6, 7, 8, 9], dtype=object)
``````

The code above creates a new array. If you wish to modify `arr` in place, you could use boolean indexing `arr[arr < 4] = None` (as pointed out by @Chris Mueller) or `putmask()`:

``````In [432]: np.putmask(arr, arr < 4, None)

In [433]: arr
Out[433]: array([None, None, None, None, 4, 5, 6, 7, 8, 9], dtype=object)
``````

Unless you are constrained to use `None` as a "flag" value, I would suggest you to stick to @ev-br's recommendation and use `np.nan` instead. I will follow that approach to assess performance:

``````In [434]: arr = np.arange(1000000, dtype=np.float)

In [435]: timeit np.where(arr > 3, arr, np.nan)
100 loops, best of 3: 3.61 ms per loop

In [436]: timeit arr[arr < 4] = np.nan
1000 loops, best of 3: 564 µs per loop

In [437]: timeit np.putmask(arr, arr < 4, np.nan)
1000 loops, best of 3: 1.08 ms per loop
``````

Notice that I used a much larger array to further highlight efficiency differences. And the winner is... boolean indexing.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download