Sidney Sidney - 3 months ago 6
Python Question

Convert "slightly inconsistent" Pandas Column to DateTime

I have a column of data from different sources, and hence with slightly inconsistency problem on the time-stamp string:

data_test DataTime
0 2012-10-03 12:14:18.257000000
1 2012-10-01 08:39:54.633000000
2 2012-10-05 07:50:14.203000000
3 2012-10-02 15:02:42.843000000
4 2012-10-02 09:02:13
5 2012-10-02 09:02:13
6 2012-10-09 11:00:36
7 2012-10-09 11:00:36


Some 'Second's are integer and some are float numbers, so both following methods would fail:



import datetime as dt
#Method 1: consider the float
data_test['DataTime'] = data_test['DataTime'].apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S.%f'))
#Method 2: ignore the float
data_test['DataTime'] = data_test['DataTime'].apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))


Is there any simple methods I could convert this column into DateTime?

Answer

you can use to_datetime() method:

In [222]: df
Out[222]:
                        DataTime
0  2012-10-03 12:14:18.257000000
1  2012-10-01 08:39:54.633000000
2  2012-10-05 07:50:14.203000000
3  2012-10-02 15:02:42.843000000
4            2012-10-02 09:02:13
5            2012-10-02 09:02:13
6            2012-10-09 11:00:36
7            2012-10-09 11:00:36

In [223]: df.dtypes
Out[223]:
DataTime    object
dtype: object

In [224]: df.DataTime = pd.to_datetime(df.DataTime)

In [225]: df
Out[225]:
                 DataTime
0 2012-10-03 12:14:18.257
1 2012-10-01 08:39:54.633
2 2012-10-05 07:50:14.203
3 2012-10-02 15:02:42.843
4 2012-10-02 09:02:13.000
5 2012-10-02 09:02:13.000
6 2012-10-09 11:00:36.000
7 2012-10-09 11:00:36.000

In [226]: df.dtypes
Out[226]:
DataTime    datetime64[ns]
dtype: object
Comments