vera vera - 3 months ago 41
Python Question

pd.to_datetime or parse datetimes won't work with my csv file (format: dd/mm/yyyy, hh:mm:ss)

I extracted the following table from my csv file

Date,Time,CO2(ppm),CellTemp(c),CellPres(kPa)
10/08/2016,13:21:11,356.89,51.07,99.91
10/08/2016,13:21:12,356.89,51.07,99.91
10/08/2016,13:21:13,356.83,51.07,99.91


I researched the last couple of days and tried different things to make pandas read the
Date
and
Time
columns as
datetime
, but I just can't make it. Here are some of the things I tried:

df = pd.read_csv(myfile)
print(df.dtypes)


I get:

Date object
Time object
CO2(ppm) object
CellTemp(c) object
CellPres(kPa) object
dtype: object


When I try:

df_2 = pd.read_csv(file, parse_dates=[['Date', 'Time']])
print(df_2.dtypes)


I get

Date_Time object
CO2(ppm) object
CellTemp(c) object
CellPres(kPa) object
dtype: object


So, now
Date
and
Time
are in one column (
11/08/2016 14:06:18
) (what I want), but not recognized as
datetime
.

When I then try:

pd.to_datetime(df_2['Date_Time'], format='%d/%m/%Y %H:%M:%S)


I get the error message:

File "<ipython-input-31-ace4ed1a0aa9>", line 1
pd.to_datetime(df_2['Date_Time'],format='%d/%m/%Y %H:%M:%S
SyntaxError: EOL while scanning string literal


When I try:

import dateutil.parser
dateutil.parser.parse(df_2['Date_Time'])


I get (besides some other output) the error message:

AttributeError: 'Series' object has no attribute 'read'


I also manually changed the dateformat to
yyyy-mm-dd
in Excel and tried the same things without any better result. I kind of think it must be a very basic mistake I am doing, I am new to scripting and would appreciate any help. Please apologize if my question has formatting errors I really tried.

Answer

It looks like there is some wrong datetime or some value, which cannot be converted to datetime, so you can add parameter errors='coerce' to to_datetime for converting them to NaT:

#31.11. does not exist
print (df_2)
             Date_Time  CO2(ppm)  CellTemp(c)  CellPres(kPa)  
0  10/08/2016 13:21:11    356.89        51.07            99.91
1  10/08/2016 13:21:12    356.89        51.07            99.91
2  31/11/2016 13:21:13    356.83        51.07            99.91

df_2['Date_Time'] = pd.to_datetime(df_2['Date_Time'],
                                   format='%d/%m/%Y %H:%M:%S',
                                   errors='coerce')
print (df_2)
            Date_Time  CO2(ppm)  CellTemp(c)  CellPres(kPa)  
0 2016-08-10 13:21:11    356.89        51.07            99.91
1 2016-08-10 13:21:12    356.89        51.07            99.91
2                 NaT    356.83        51.07            99.91

You can also check all problematic values by:

print (df_2[pd.to_datetime(df_2['Date_Time'],format='%d/%m/%Y %H:%M:%S', errors='coerce').isnull()])
             Date_Time  CO2(ppm)  CellTemp(c)  CellPres(kPa)  
2  31/11/2016 13:21:13    356.83        51.07            99.91