sami sami - 1 year ago 219
Python Question

pandas filtering datetime columns that include none

I have a pandas data frame with two 'datetime' columns t1,t2. Now i need to filter out all rows in the dataframe where t1<=t2
t2 could be Nan

before panda 0.19.0
i could do this:

import pandas as pd
from datetime import datetime
dt = datetime.utcnow()
dt64 = np.datetime64(dt)
df = pd.DataFrame([(dt64,None)], columns=['t1','t2'])

after pandas 0.19.0 this code fails

Traceback (most recent call last):
File "workspace/python/MyTests/", line 87, in <module>
File "workspace/python/MyTests/", line 80, in testDfTimeCompare
File "anaconda/lib/python2.7/site-packages/pandas/core/", line 813, in wrapper
return self._constructor(na_op(self.values, other.values),
File "anaconda/lib/python2.7/site-packages/pandas/core/", line 787, in na_op
y = y.view('i8')
File "anaconda/lib/python2.7/site-packages/numpy/core/", line 367, in _view_is_safe
raise TypeError("Cannot change data-type for object array.")
TypeError: Cannot change data-type for object array.

What is the best way to achieve this.

Answer Source

I think you need convert column t2 to_datetime for cast None to NaT, then can use faster function Series.le what is same as <=:

df.t2 = pd.to_datetime(df.t2)
print (df)
                          t1  t2
0 2016-11-04 07:24:53.372838 NaT

mask = df.t1.le(df.t2)
print (mask)
0    False
dtype: bool

mask = df.t1 <= df.t2
print (mask)
0    False
dtype: bool
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download