sparrow sparrow - 3 months ago 6
Python Question

Ignore string data that does not match a certain format when calculating "min" with Pandas

I have a DataFrame column 'datetime' with the values in this format:

'2016-08-01 13:43:35'


I would like to find the min and max values. The problem is that some of the rows are missing time values so they look like this:

'2016-07-29 '


How can I exclude the rows with missing data when calculating the min and max?

Here is how I'm finding the min value:

min_ = df['datetime'].min()


The minimum value that I'm trying to find is the earliest date/time combination where both are included. So for example in from my data:

'7/29/2016 11:02:38' would be the desired value.

Answer

You can convert values that have a specific format to datetime, and the remaining ones will be NaT. If you take the minimum on the resulting series, it will ignore NaTs.

df['datetime'] = ['2016-08-01 13:43:35', '2016-06-01 13:43:35', '2013-08-01 13:43:35',
                  '2016-07-29  ']
df
Out: 
              datetime
0  2016-08-01 13:43:35
1  2016-06-01 13:43:35
2  2013-08-01 13:43:35
3         2016-07-29  


pd.to_datetime(df['datetime'], format='%Y-%m-%d %H:%M:%S', errors='coerce')
Out: 
0   2016-08-01 13:43:35
1   2016-06-01 13:43:35
2   2013-08-01 13:43:35
3                   NaT
Name: datetime, dtype: datetime64[ns]

pd.to_datetime(df['datetime'], format='%Y-%m-%d %H:%M:%S', errors='coerce').min()
Out: Timestamp('2013-08-01 13:43:35')
Comments