Michael Roberts Michael Roberts - 1 year ago 67
Python Question

Converting dates within the pandas data structure format

So I have used pandas to open up my data which is in .csv format. I'd now like to reformat the dates from its current dd/mm/YYYY hh:mm:ss format to purely YYYY-mm-dd format, e.g., from 19/11/2014 15:26:13 to 2014-11-19. How would I go about doing this within a pandas data array? i.e., converting from this:

Id User Id Start Time End Time Climb Time
0 74618 27366 19/11/2014 15:26:13 19/11/2014 15:26:18 5
1 74632 27366 19/11/2014 15:26:18 19/11/2014 15:42:26 968
2 74633 27366 19/11/2014 15:42:26 19/11/2014 15:42:48 22

To this:

Id User Id Start Time End Time Climb Time
0 74618 27366 2014-11-19 2014-11-19 5
1 74632 27366 2014-11-19 2014-11-19 968
2 74633 27366 2014-11-19 2014-11-19 22

I've tried another of pandas ready methods, but none have seemed to work or recognise the initial data. I was wondering if anyone knew of any methods which could achieve this...

Answer Source

Let me give this a shot and see if it is what you are looking for.

Input CSV is like this (I called it test.csv for this example)

Id,User Id,Start Time,End Time,Climb Time
74618,27366,19/11/2014 15:26:13,19/11/2014 15:26:18,5   
74632,27366,19/11/2014 15:26:18,19/11/2014 15:42:26,968   
74633,27366,19/11/2014 15:42:26,19/11/2014 15:42:48,22

Now some input/output from iPython.

In [1]: import pandas as pd

In [2]: df = pd.read_csv('test.csv')

We can have a look at the conversion to date-time. Note the "type" we get at the end.

In [5]: pd.to_datetime(df['Start Time'])
0   2014-11-19 15:26:13
1   2014-11-19 15:26:18
2   2014-11-19 15:42:26
Name: Start Time, dtype: datetime64[ns]

Now you can save this back to a new column. to_datetime is pretty clever in terms of getting the dates and time parts. If it gets your dates wrong, you can specify the format for the input date-time using the format key-word

df['start_dt'] = pd.to_datetime(df['Start Time'])

Finally, you can check the dtypes for the columns in the Data Frame:

In [9]: df.dtypes
Id                     int64
User Id                int64
Start Time            object
End Time              object
Climb Time             int64
start_dt      datetime64[ns]
dtype: object

Now this new column, it is a datetime object and you can display or sort it in any way you might like to do that. From your OP, it seems you want to just get the date part. That is easy, you can do this (from this post)

In [10]: df['start_dt'].dt.date
0    2014-11-19
1    2014-11-19
2    2014-11-19
Name: start_dt, dtype: object

Now we can put this all together and do the conversion and then put it as a date in one line. It seems you want to recycle the column name, so I am doing that here, but it is not necessary. You can give the 'new' column any name you wish.

In [13]: df['Start Time'] = pd.to_datetime(df['Start Time']).dt.date

In [14]: df['End Time'] = pd.to_datetime(df['End Time']).dt.date

In [15]: df
      Id  User Id  Start Time    End Time  Climb Time    start_dt
0  74618    27366  2014-11-19  2014-11-19           5  2014-11-19
1  74632    27366  2014-11-19  2014-11-19         968  2014-11-19
2  74633    27366  2014-11-19  2014-11-19          22  2014-11-19

I think the result is what you are looking for.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download