ch36r5s ch36r5s - 3 months ago 17
Python Question

How to remove double quotes from index of csv file in python

I'm trying to read multiple

csv
files with python. The index of raw data(or the first column) has a little problem, the partial csv file looks like this:

NoDemande;"NoUsager";"Sens";"IdVehiculeUtilise";"NoConducteur";"NoAdresse";"Fait";"HeurePrevue"
42210000003;"42210000529";"+";"265Véh";"42210000032";"42210002932";"1";"25/07/2015 10:00:04"
42210000005;"42210001805";"+";"265Véh";"42210000032";"42210002932";"1";"25/07/2015 10:00:04"
42210000004;"42210002678";"+";"265Véh";"42210000032";"42210002932";"1";"25/07/2015 10:00:04"
42210000003;"42210000529";"—";"265Véh";"42210000032";"42210004900";"1";"25/07/2015 10:50:03"
42210000004;"42210002678";"—";"265Véh";"42210000032";"42210007072";"1";"25/07/2015 11:25:03"
42210000005;"42210001805";"—";"265Véh";"42210000032";"42210004236";"1";"25/07/2015 11:40:03"


The first index has no
""
, after reading the file, it looks like:
"NoDemande"
while others have no
""
, and the rest of column looks just fine, which makes the result looks like(not the same lines):

"NoDemande" NoUsager Sens IdVehiculeUtilise NoConducteur NoAdresse Fait HeurePrevue
42209000003 42209001975 + 245Véh 42209000002 42209005712 1 24/07/2015 06:30:04
42209000004 42209002021 + 245Véh 42209000002 42209005712 1 24/07/2015 06:30:04
42209000005 42209002208 + 245Véh 42209000002 42209005713 1 24/07/2015 06:45:04
42216000357 42216001501 - 190Véh 42216000139 42216001418 1 31/07/2015 17:15:03
42216000139 42216000788 - 309V7pVéh 42216000059 42216006210 1 31/07/2015 17:15:03
42216000118 42216000188 - 198Véh 42216000051 42216006374 1 31/07/2015 17:15:03


It causes problem identifying name of index in the coming moves. How to solve this problem?
Here's my code of reading files:

import pandas as pd
import glob

pd.set_option('expand_frame_repr', False)
path = r'D:\Python27\mypfe\data_test'
allFiles = glob.glob(path + "/*.csv")
frame = pd.DataFrame()
list_ = []

for file_ in allFiles:
#Read file
df = pd.read_csv(file_,header=0,sep=';',dayfirst=True,encoding='utf8',
dtype='str')

df['Sens'].replace(u'\u2014','-',inplace=True)

list_.append(df)
print"fichier lu ",file_

frame = pd.concat(list_)
print frame

Answer

I think the simpliest is set new column names:

df.columns = ['NoDemande1'] + df.columns[1:].tolist()
print (df)
    NoDemande1     NoUsager Sens IdVehiculeUtilise  NoConducteur    NoAdresse  \
0  42210000003  42210000529    +            265Véh   42210000032  42210002932   
1  42210000005  42210001805    +            265Véh   42210000032  42210002932   
2  42210000004  42210002678    +            265Véh   42210000032  42210002932   
3  42210000003  42210000529    -           265Véh   42210000032  42210004900   
4  42210000004  42210002678    -           265Véh   42210000032  42210007072   
5  42210000005  42210001805    -           265Véh   42210000032  42210004236   

   Fait          HeurePrevue  
0     1  25/07/2015;10:00:04  
1     1  25/07/2015;10:00:04  
2     1  25/07/2015;10:00:04  
3     1  25/07/2015;10:50:03  
4     1  25/07/2015;11:25:03  
5     1  25/07/2015;11:40:03  

Another solution is strip values " from column names:

print (df)
   "NoDemande"     NoUsager Sens IdVehiculeUtilise  NoConducteur    NoAdresse  \
0  42210000003  42210000529    +            265Véh   42210000032  42210002932   
1  42210000005  42210001805    +            265Véh   42210000032  42210002932   
2  42210000004  42210002678    +            265Véh   42210000032  42210002932   
3  42210000003  42210000529    -           265Véh   42210000032  42210004900   
4  42210000004  42210002678    -           265Véh   42210000032  42210007072   
5  42210000005  42210001805    -           265Véh   42210000032  42210004236   

   Fait          HeurePrevue  
0     1  25/07/2015;10:00:04  
1     1  25/07/2015;10:00:04  
2     1  25/07/2015;10:00:04  
3     1  25/07/2015;10:50:03  
4     1  25/07/2015;11:25:03  
5     1  25/07/2015;11:40:03

df.columns = df.columns.str.strip('"')
print (df)
     NoDemande     NoUsager Sens IdVehiculeUtilise  NoConducteur    NoAdresse  \
0  42210000003  42210000529    +            265Véh   42210000032  42210002932   
1  42210000005  42210001805    +            265Véh   42210000032  42210002932   
2  42210000004  42210002678    +            265Véh   42210000032  42210002932   
3  42210000003  42210000529    -            265Véh   42210000032  42210004900   
4  42210000004  42210002678    -            265Véh   42210000032  42210007072   
5  42210000005  42210001805    -            265Véh   42210000032  42210004236   

   Fait          HeurePrevue  
0     1  25/07/2015;10:00:04  
1     1  25/07/2015;10:00:04  
2     1  25/07/2015;10:00:04  
3     1  25/07/2015;10:50:03  
4     1  25/07/2015;11:25:03  
5     1  25/07/2015;11:40:03