meow meow - 2 months ago 17x
Python Question

Python Numpy nanmax() returning nan when there are non nan values in the array

I tried to use Numpy's nanmax function to get the max of all non-nan values in a matrix's column, for some it works, for some it returns nan as the maximum. However, there are non-nan values in every column and just to be sure I tried the same thing in R with max(x, na.rm = T) and everything is fine there.

Anyone has any ideas of why this occurs? The only thing I can think of is that I converted the numpy matrix from a pandas frame but I really have no clue...

np.nanmax(datamatrix, axis=0)

matrix([[1, 101, 193, 1, 163.0, 10.6, nan, 4.7, 142.0, 0.47, 595.0,
170.0, 5.73, 24.0, 27.0, 23.0, 361.0, 33.0, 94.0, 9.2, 16.8, nan,
nan, 91.0, nan, nan, nan, nan, 0.0, 105.0, nan, nan, nan, nan,nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan]], dtype=object)


Your array is an object array, meaning the elements in the array are arbitrary python objects. Pandas uses object arrays, so it is likely that when you converted your Pandas DataFrame to a numpy array, the result was an object array. nanmax() doesn't handle object arrays correctly.

Here are a couple examples, one using a numpy.matrix and one a numpy.ndarray. With a matrix, you get no warning at all the something went wrong:

In [1]: import numpy as np

In [2]: m = np.matrix([[2.0, np.nan, np.nan]], dtype=object)

In [3]: np.nanmax(m)
Out[3]: nan

With an array, you get a cryptic warning, but nan is still returned:

In [4]: a = np.array([[2.0, np.nan, np.nan]], dtype=object)

In [5]: np.nanmax(a)
/Users/warren/miniconda3scipy/lib/python3.5/site-packages/numpy/lib/ RuntimeWarning: All-NaN slice encountered
  warnings.warn("All-NaN slice encountered", RuntimeWarning)
Out[5]: nan

You can determine if your array is an object array in a few ways. When you display the array in an interactive python or ipython shell, you'll see dtype=object. Or you can check a.dtype; if a is an object array, you'll see either dtype('O') or object (depending on whether you end up seeing the str() or repr() of the dtype).

Assuming all the values in the array are, in fact, floating point values, a way to work around this is to first convert from the object array to an array of floating point values:

In [6]: b = a.astype(np.float64)

In [7]: b
Out[7]: array([[  2.,  nan,  nan]])

In [8]: np.nanmax(b)
Out[8]: 2.0

In [9]: n = m.astype(np.float64)

In [10]: np.nanmax(n)
Out[10]: 2.0