user2242044 user2242044 - 3 months ago 13
Python Question

Alternatives to looping in Pandas when you need to update a column based on another

I have a

Pandas
dataframe
with text dates that'd I like to convert to
datetime
. The problem is some of my text dates are bad data and thus can't be converted. In cases for which a date can't be converted, I want to update an
Error
column to a value of
True
as well as set the
Date
column to
None
so that it can later be added to a database column that is formatted as a
datetime
.

This is a simplified example. My dataframe may have 1 million rows and multiple date columns this needs to be done for, so I need a faster way of doing this. I know the typical convention is to avoid looping with
Pandas
, but I can't figure out a way around it.

import pandas as pd
import numpy as np
import datetime

data = 1000 *[['010115', None],
['320115', None]]


df = pd.DataFrame(data=data,
columns=['Date', 'Error'])

for index, row in df.iterrows():
try:
datetime.datetime.strptime(row['Date'], '%d%m%y')
except ValueError:
row['Date'] = None
row['Error'] = True
except TypeError:
pass

print df

Answer

You can use to_datetime with parameter errors='coerce' and isnull:

data = 10 *[['010115', None],
        ['320115', None]]


df = pd.DataFrame(data=data,
                  columns=['Date', 'Error'])

print (df)
      Date Error
0   010115  None
1   320115  None
2   010115  None
3   320115  None
4   010115  None
5   320115  None
6   010115  None
7   320115  None
8   010115  None
9   320115  None
10  010115  None
11  320115  None
12  010115  None
13  320115  None
14  010115  None
15  320115  None
16  010115  None
17  320115  None
18  010115  None
19  320115  None
df['Date'] = pd.to_datetime(df['Date'], format='%d%m%y',errors='coerce') 
df['Error'] = df['Date'].isnull()
print (df)
         Date  Error
0  2015-01-01  False
1         NaT   True
2  2015-01-01  False
3         NaT   True
4  2015-01-01  False
5         NaT   True
6  2015-01-01  False
7         NaT   True
8  2015-01-01  False
9         NaT   True
10 2015-01-01  False
11        NaT   True
12 2015-01-01  False
13        NaT   True
14 2015-01-01  False
15        NaT   True
16 2015-01-01  False
17        NaT   True
18 2015-01-01  False
19        NaT   True
Comments