Chris Chris - 5 months ago 96
Python Question

Find max value in a pandas dataframe that also has None's, Python 3.5

I have a pandas dataframe setup like this:

Group1 Group2 Group3
0 0.04058678 0.04282689 0.06680679
1 0.11657916 0.06695174 0.05153584
2 0.08382576 0.03587087 0.08919266
3 0.17477007 0.08141088 0.10727157
4 0.0821453 0.08226264 0.06800853
5 0.15685707 None 0.09467674
6 0.08237982 None 0.14494069
7 None None 0.14541177
8 None None 0.12181681
9 None None 0.17966472
10 None None 0.1509818


I tried using df.max() to find the maximum value in the dataframe, but it doesn't work with this data and I think it is because of None in some fields.

I get this error:

print(df.max())
TypeError: unorderable types: float() > str()


How do I deal with None in this dataframe so that I can get the maximum value?

Answer

is that what you want?

maximum element:

In [53]: df.replace('None', np.nan).max().max()
Out[53]: 0.17966472

or

In [46]: df.replace('None', -np.inf).max()
Out[46]:
Group3    0.179665
dtype: float64

maximum per column:

In [35]: df.replace('None', np.nan).astype(float).max()
Out[35]:
Group1    0.174770
Group2    0.082263
Group3    0.179665
dtype: float64

or indexes for max values

In [28]: df.replace('None', np.nan).astype('float').idxmax()
Out[28]:
Group1    3
Group2    4
Group3    9
dtype: int64

Explanation:

first replace all None's with np.nan (not a number):

In [56]: df.replace('None', np.nan)
Out[56]:
        Group1      Group2    Group3
0   0.04058678  0.04282689  0.066807
1   0.11657916  0.06695174  0.051536
2   0.08382576  0.03587087  0.089193
3   0.17477007  0.08141088  0.107272
4    0.0821453  0.08226264  0.068009
5   0.15685707         NaN  0.094677
6   0.08237982         NaN  0.144941
7          NaN         NaN  0.145412
8          NaN         NaN  0.121817
9          NaN         NaN  0.179665
10         NaN         NaN  0.150982

find maximum (returns pandas series):

In [59]: df.replace('None', np.nan).max()
Out[59]:
Group3    0.179665
dtype: float64

In [67]: type(df.replace('None', 0).max())
Out[67]: pandas.core.series.Series

find maximum in series:

In [68]: df.replace('None', 0).max().max()
Out[68]: 0.17966472
Comments