Baig Baig - 3 months ago 22
Python Question

Convert number to date format using Python

I am reading data from a text file with more that 14000 rows and there is a column which has eight (08) digit numbers in it. The format for some of the rows are like:


  • 01021943

  • 02031944

  • 00041945

  • 00001946



The problem is that when I use to_date function it converts the datatype of the date from object to int64 but I want it to be datetime. Second by using the to_datetime function the dates like


  • 00041945 becomes 41945

  • 00001946 becomes 1946 and hence I cannot properly format them


Answer

You can add parameter dtype to read_csv for converting column col to string and then use to_datetime with parameters format for specify formatting and errors='coerce' - because bad dates, which are converted to NaT:

import pandas as pd
import io

temp=u"""col
01021943
02031944
00041945
00001946"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), dtype={'col': 'str'})

df['col'] = pd.to_datetime(df['col'], format='%d%m%Y', errors='coerce')

print (df)
         col
0 1943-02-01
1 1944-03-02
2        NaT
3        NaT

print (df.dtypes)
col    datetime64[ns]
dtype: object

Thanks Jon Clements for another solution:

import pandas as pd
import io

temp=u"""col_name
01021943
02031944
00041945
00001946"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), 
                 converters={'col_name': lambda dt: pd.to_datetime(dt, format='%d%m%Y', errors='coerce')})

print (df)
    col_name
0 1943-02-01
1 1944-03-02
2        NaT
3        NaT

print (df.dtypes)
col_name    datetime64[ns]
dtype: object