Franklin Harvey Franklin Harvey - 9 months ago 71
Python Question

Python replace multiple date columns with datetime

I have csv files with multiple headers:

Year', 'Month', 'Day', 'Hour', 'Minute' --other headers--

I would like to downsize these with datetime. Right now I'm doing:

date = datetime(year=int(d[0]), month=int(d[1]), day=int(d[2]]), hour=int(d[3]]), minute=int(d[4]]))

But how do I actually delete those columns and put the date column before all other columns? I eventually want the "date" to be the index in a pandas dataframe.

The first five rows of one of my CSV's is:


I would like to have


Answer Source

You can specify the parse_dates parameter as the indices of the date time related columns and write a customized date_parser function to convert the multiple columns as a datetime column. Specify the index_col if you want it to be the index:

import pandas as pd   
from io import StringIO   
            sep = ",", 
            parse_dates={'Date': [0,1,2,3,4]}, 
            date_parser = lambda x: pd.to_datetime(x, format="%Y %m %d %H %M"), 
            index_col = ['Date'])

enter image description here

Update: The way date_parser function works according to pd.read_csv() docs:

1) Pass one or more arrays (as defined by parse_dates) as arguments; 2) concatenate (row-wise) the string values from the columns defined by parse_dates into a single array and pass that; and 3) call date_parser once for each row using one or more strings (corresponding to the columns defined by parse_dates) as arguments.

There are multiple ways this parameter can be used depending on how your customized date parser function is written, in the case given above, for instance, the second option is used, for every row, columns 0-4 are concatenated as strings separated by a whitespace and passed to pd.to_datetime() function and so a format of %Y %m %d %H %M works here.