ajsp ajsp - 4 days ago 7
Python Question

Quick way of transforming a datetime column in Pandas

I have a mountain of CSV's where the date column is the following:

Print df
Date
0 20090501 00:00:00.831
1 20090501 00:00:00.832
2 20090501 00:00:01.078
3 20090501 00:00:01.337
4 20090501 00:00:01.580
5 20090501 00:00:01.581
6 20090501 00:00:01.582
7 20090501 00:00:01.602


From here I want to express it in the format
'%Y%m%d %H:%M:%S.%f'
, hence:

df['Date'] = pd.to_datetime(df['Date'], format='%Y%m%d %H:%M:%S.%f')
print df
Date
2009-05-01 00:00:00.831
1 2009-05-01 00:00:00.832
2 2009-05-01 00:00:01.078
3 2009-05-01 00:00:01.337
4 2009-05-01 00:00:01.580
5 2009-05-01 00:00:01.581


And finally from there, split it into separate date and time columns using the following:

df['Time'] = df['Date'].apply(lambda x:x.time())
df['Date1']= df['Date'].apply(lambda x:x.date())
print df
Time Date1
0 00:00:00.831000 2009-05-01
1 00:00:00.832000 2009-05-01
2 00:00:01.078000 2009-05-01
3 00:00:01.337000 2009-05-01
4 00:00:01.580000 2009-05-01
5 00:00:01.581000 2009-05-01
6 00:00:01.582000 2009-05-01


The problem is that the lambda functions take around one minute to complete, and I have something in the range of 30000 CSV's to process each with approximately two million rows. If anyone can give me a faster solution it would help a great deal.

Thanks

Answer

Use dt.time and dt.date:

df['Time'] = df['Date'].dt.time
df['Date1']= df['Date'].dt.date
print (df)
                     Date             Time       Date1
0 2009-05-01 00:00:00.831  00:00:00.831000  2009-05-01
1 2009-05-01 00:00:00.832  00:00:00.832000  2009-05-01
2 2009-05-01 00:00:01.078  00:00:01.078000  2009-05-01
3 2009-05-01 00:00:01.337  00:00:01.337000  2009-05-01
4 2009-05-01 00:00:01.580  00:00:01.580000  2009-05-01
5 2009-05-01 00:00:01.581  00:00:01.581000  2009-05-01
6 2009-05-01 00:00:01.582  00:00:01.582000  2009-05-01
7 2009-05-01 00:00:01.602  00:00:01.602000  2009-05-01
Comments