sami sami - 1 month ago 17
Python Question

pandas filtering datetime columns that include none

I have a pandas data frame with two 'datetime' columns t1,t2. Now i need to filter out all rows in the dataframe where t1<=t2
t2 could be Nan

before panda 0.19.0
i could do this:

import pandas as pd
from datetime import datetime
dt = datetime.utcnow()
dt64 = np.datetime64(dt)
df = pd.DataFrame([(dt64,None)], columns=['t1','t2'])
df[(df.t1<=df.t2)]


after pandas 0.19.0 this code fails

Traceback (most recent call last):
File "workspace/python/MyTests/test1.py", line 87, in <module>
testDfTimeCompare()
File "workspace/python/MyTests/test1.py", line 80, in testDfTimeCompare
df[(df.t1<=df.t2)]
File "anaconda/lib/python2.7/site-packages/pandas/core/ops.py", line 813, in wrapper
return self._constructor(na_op(self.values, other.values),
File "anaconda/lib/python2.7/site-packages/pandas/core/ops.py", line 787, in na_op
y = y.view('i8')
File "anaconda/lib/python2.7/site-packages/numpy/core/_internal.py", line 367, in _view_is_safe
raise TypeError("Cannot change data-type for object array.")
TypeError: Cannot change data-type for object array.


What is the best way to achieve this.

Answer

I think you need convert column t2 to_datetime for cast None to NaT, then can use faster function Series.le what is same as <=:

df.t2 = pd.to_datetime(df.t2)
print (df)
                          t1  t2
0 2016-11-04 07:24:53.372838 NaT

mask = df.t1.le(df.t2)
print (mask)
0    False
dtype: bool

mask = df.t1 <= df.t2
print (mask)
0    False
dtype: bool