Nicholas - 1 year ago 143

Python Question

What happens when using max() and min() on pandas.core.series.Series type that has NaN in it? Is this a bug? See below,

`%matplotlib inline`

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

mydata = pd.DataFrame(np.random.standard_normal((100,1)), columns=['No NaN'])

mydata['Has NaN'] = mydata['No NaN'] / mydata['No NaN'].shift(1)

# Both return NaN!

print(min(mydata['Has NaN']), max(mydata['Has NaN']))

# Still why False? Isn't float('nan') a singleton like None?

print(min(mydata['Has NaN']) == max(mydata['Has NaN']))

# But this time works well!

print(min([1, 2, 3, float('nan')]))

print('\n')

# When Series data type that has NaN bumps into min() and max(), what should

# I do? E.g.,

try:

n, bins, patches = plt.hist(mydata['Has NaN'], 10)

except ValueError as e:

print(e, '\nSeems "range" argument in hist() has problem!')

Answer Source

First, you shouldn't use the Python built-in `max`

or `min`

when dealing with `pandas`

or `numpy`

, especially when you are working with `nan`

.

Since 'nan' is the first item of `mydata['Has NaN']`

, it is never replaced in either `max`

or `min`

because (as stated in the docs):

The not-a-number values float('NaN') and Decimal('NaN') are special. They are identical to themselves (x is x is true) but are not equal to themselves (x == x is false). Additionally, comparing any number to a not-a-number value will return False. For example, both 3 < float('NaN') and float('NaN') < 3 will return False.

Instead, use the `pandas`

`max`

and `min`

methods:

```
In [4]: mydata['Has NaN'].min()
Out[4]: -176.9844930355774
In [5]: mydata['Has NaN'].max()
Out[5]: 12.684033138603787
```

With regards to the histogram, it seems this is a known issue with `plt.hist`

, see here and here.

It should be fairly straightforward to deal with for now, though:

```
n, bins, patches = plt.hist(mydata['Has NaN'][~mydata['Has NaN'].isnull()], 10)
```