LancelotHolmes LancelotHolmes - 1 month ago 9
Python Question

Python Change String(date) format of specific column in csv file

I'm trying to convert the value of a column named

date
in my csv file to another format, for instance:

Original records



transfer id,player id,player name,season,date
732058,1126,,12/13,Jul 1- 2012
581951,1126,,11/12,Jun 3- 2011
295000,1126,,09/10,Aug 12- 2009
98459,1126,,06/07,Nov 6- 2006
7267,1126,,03/04,Jul 2- 2003
...


and I'd like to get a result like

transfer id,player id,player name,season,date
732058,1126,,12/13,2012-07-01
581951,1126,,11/12,2011-06-03
295000,1126,,09/10,2009-08-12
98459,1126,,06/07,2006-11-06
7267,1126,,03/04,2003-07-02
...


as the data stored in csv file are string, I have written a method which can convert the date format:

import time

# convert date from original format to new format
def date_convert(_date,fmt_original,fmt_new):
if date_validate(_date,fmt_original):
timeArray=time.strptime(_date,fmt_original)
return time.strftime(fmt_new,timeArray)
else:
return '0001-01-01'

def date_validate(_date,fmt_original):
try:
time.strptime(_date, fmt_original)
return True
except ValueError:
return False


then I try to change the date in csv file,and I try to utilize
pandas
, as is told by @MaxU:

and I write a code like

import pandas as pd
import date_format

df=pd.read_csv('the_transfer_info_test.csv',delimiter=',')
df.date=date_format.date_convert(df.date, '%b %d- %Y', '%Y-%m-%d')
print df


and at first I got an exception like this:

TypeError: expected string or buffer


I thought it may be related to the data type,as the df.date get a type of Series in pandas, so I coded as

df.date=date_format.date_convert(str(df.date), '%b %d- %Y', '%Y-%m-%d')


but it returns all
0001-01-01
which is the exception date in date_format, therefore I searched how to convert the Series to String and find an answer like
@Amit, and I tried methods below:

df['date'].astype(basestring)
df.date.apply(str)
df['date'].astype(str)
df['date'].astype('str')


but they don't work for me, I got the same exception like:

TypeError: expected string or buffer


I wonder how I can convert specific column values in csv file,either utilize pandas or not.

BTW, my python version is 2.7.12 with IDE PyCharm and Anoconda 4.0.0 and pandas 0.18.0.

Any help is appreciated,thank you.




Thanks to @jezrael, for my sample above it both work well, it's my fault that I meant to simplify my issue and simplified my question,actually my original data is like:

transfer id,player id,player name,season,date,move from,move from id,move to,move to id,market value,transfer fee
732058,1126,,12/13,Jul 1- 2012,e-frankfurt,24,1-fc-koln,3,£1.06m,Free transfer
581951,1126,,11/12,Jul 1- 2011,fc-st-pauli,35,eintracht-frankfurt,24,£1.70m,£425k
295000,1126,,09/10,Jul 1- 2009,alem-aachen,8,fc-st-pauli,35,£850k,Free transfer
98459,1126,,06/07,Jul 1- 2006,1860-munich,72,alemannia-aachen,8,£1.36m,£765k
7267,1126,,03/04,Jul 1- 2003,stuttgart-ii,102,tsv-1860-munich,72,-,£21k
...


and actually these approaches work well with part of my data I mean if I test it with afew lines with the same format,but when it comes to the original data which is about 40000 records,it's wired that those approaches do not work any more, for
to_datetime
method,I got an exception like

ValueError: time data '-' does not match format '%b %d- %Y' (match)


while with the second method as
parse_dates
, the date format remains the same like
Jun 11- 2016
.

Again,any help will be appreciated.

Answer

I think you need to_datetime:

df.date = pd.to_datetime(df.date, format='%b %d- %Y')
print (df)
   transfer id  player id  player name season       date
0       732058       1126          NaN  12/13 2012-07-01
1       581951       1126          NaN  11/12 2011-06-03
2       295000       1126          NaN  09/10 2009-08-12
3        98459       1126          NaN  06/07 2006-11-06
4         7267       1126          NaN  03/04 2003-07-02

but is seems you can use parameter parse_dates in read_csv:

import pandas as pd
from pandas.compat import StringIO

temp=u"""transfer id,player id,player name,season,date
732058,1126,,12/13,Jul 1- 2012
581951,1126,,11/12,Jun 3- 2011
295000,1126,,09/10,Aug 12- 2009
98459,1126,,06/07,Nov 6- 2006
7267,1126,,03/04,Jul 2- 2003
"""
#after testing replace StringIO(temp) to filename
df = pd.read_csv(StringIO(temp), parse_dates=['date'])

print (df)
   transfer id  player id  player name season       date
0       732058       1126          NaN  12/13 2012-07-01
1       581951       1126          NaN  11/12 2011-06-03
2       295000       1126          NaN  09/10 2009-08-12
3        98459       1126          NaN  06/07 2006-11-06
4         7267       1126          NaN  03/04 2003-07-02

EDIT by comment:

You need parameter errors='coerce' for replacing bad data (which cannot match format to NaT):

df.date = pd.to_datetime(df.date, format='%b %d- %Y', errors='coerce')

print (df)
   transfer id  player id  player name season         date     move from  \
0       732058       1126          NaN  12/13  Jul 1- 2012   e-frankfurt   
1       581951       1126          NaN  11/12  Jul 1- 2011   fc-st-pauli   
2       295000       1126          NaN  09/10  Jul 1- 2009   alem-aachen   
3        98459       1126          NaN  06/07  Jul 1- 2006   1860-munich   
4         7267       1126          NaN  03/04  Jul 1- 2003  stuttgart-ii   
5         7267       1126          NaN  03/04            -  stuttgart-ii   

   move from id              move to  move to id market value   transfer fee  
0            24            1-fc-koln           3       £1.06m  Free transfer  
1            35  eintracht-frankfurt          24       £1.70m          £425k  
2             8          fc-st-pauli          35        £850k  Free transfer  
3            72     alemannia-aachen           8       £1.36m          £765k  
4           102      tsv-1860-munich          72            -           £21k  
5           102      tsv-1860-munich          72            -           £21k 
df.date = pd.to_datetime(df.date, format='%b %d- %Y', errors='coerce')
print (df)
   transfer id  player id  player name season       date     move from  \
0       732058       1126          NaN  12/13 2012-07-01   e-frankfurt   
1       581951       1126          NaN  11/12 2011-07-01   fc-st-pauli   
2       295000       1126          NaN  09/10 2009-07-01   alem-aachen   
3        98459       1126          NaN  06/07 2006-07-01   1860-munich   
4         7267       1126          NaN  03/04 2003-07-01  stuttgart-ii   
5         7267       1126          NaN  03/04        NaT  stuttgart-ii   

   move from id              move to  move to id market value   transfer fee  
0            24            1-fc-koln           3       £1.06m  Free transfer  
1            35  eintracht-frankfurt          24       £1.70m          £425k  
2             8          fc-st-pauli          35        £850k  Free transfer  
3            72     alemannia-aachen           8       £1.36m          £765k  
4           102      tsv-1860-munich          72            -           £21k  
5           102      tsv-1860-munich          72            -           £21k  
Comments