meow - 1 year ago 146

Python Question

I tried to use Numpy's nanmax function to get the max of all non-nan values in a matrix's column, for some it works, for some it returns nan as the maximum. However, there are non-nan values in every column and just to be sure I tried the same thing in R with max(x, na.rm = T) and everything is fine there.

Anyone has any ideas of why this occurs? The only thing I can think of is that I converted the numpy matrix from a pandas frame but I really have no clue...

np.nanmax(datamatrix, axis=0)

matrix([[1, 101, 193, 1, 163.0, 10.6, nan, 4.7, 142.0, 0.47, 595.0,

170.0, 5.73, 24.0, 27.0, 23.0, 361.0, 33.0, 94.0, 9.2, 16.8, nan,

nan, 91.0, nan, nan, nan, nan, 0.0, 105.0, nan, nan, nan, nan,nan,

nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,

nan, nan, nan, nan, nan, nan, nan, nan]], dtype=object)

Answer Source

Your array is an `object`

array, meaning the elements in the array are arbitrary python objects. Pandas uses object arrays, so it is likely that when you converted your Pandas DataFrame to a numpy array, the result was an object array. `nanmax()`

doesn't handle object arrays correctly.

Here are a couple examples, one using a `numpy.matrix`

and one a `numpy.ndarray`

. With a `matrix`

, you get no warning at all the something went wrong:

```
In [1]: import numpy as np
In [2]: m = np.matrix([[2.0, np.nan, np.nan]], dtype=object)
In [3]: np.nanmax(m)
Out[3]: nan
```

With an array, you get a cryptic warning, but `nan`

is still returned:

```
In [4]: a = np.array([[2.0, np.nan, np.nan]], dtype=object)
In [5]: np.nanmax(a)
/Users/warren/miniconda3scipy/lib/python3.5/site-packages/numpy/lib/nanfunctions.py:326: RuntimeWarning: All-NaN slice encountered
warnings.warn("All-NaN slice encountered", RuntimeWarning)
Out[5]: nan
```

You can determine if your array is an object array in a few ways. When you display the array in an interactive python or ipython shell, you'll see `dtype=object`

. Or you can check `a.dtype`

; if `a`

is an object array, you'll see either `dtype('O')`

or `object`

(depending on whether you end up seeing the `str()`

or `repr()`

of the dtype).

Assuming all the values in the array are, in fact, floating point values, a way to work around this is to first convert from the object array to an array of floating point values:

```
In [6]: b = a.astype(np.float64)
In [7]: b
Out[7]: array([[ 2., nan, nan]])
In [8]: np.nanmax(b)
Out[8]: 2.0
In [9]: n = m.astype(np.float64)
In [10]: np.nanmax(n)
Out[10]: 2.0
```