SSC SSC - 11 days ago 4
Python Question

How to call pandas read_csv() without it parsing date string

I am working with some data that I download from the web in csv format. The original data is shown as following.

Test Data
"Date","T1","T2","T3","T4","T5","T6","T7","T8"
"105/11/01","123,855","1,150,909","9.30","9.36","9.27","9.28","-0.06","60",
"105/11/02","114,385","1,062,118","9.26","9.42","9.23","9.31","+0.03","78",
"105/11/03","71,350","659,848","9.30","9.30","9.20","9.28","-0.03","42",


I use following code to read it

import pandas as pd
df = pd.read_csv("test.csv", skiprows=[0], usecols=[0,3,4,5])


I have also tried to use

import pandas as pd
df = pd.read_csv("test.csv", skiprows=[0], usecols=[0,3,4,5], keep_date_col=True)


I always get the following results

Date T3 T4 T5
105/11/01 9.30 9.36 9.27 NaN
105/11/02 9.26 9.42 9.23 NaN
105/11/03 9.30 9.30 9.20 NaN


This is what I want to get

Date T3 T4 T5
105/11/01 9.30 9.36 9.27
105/11/02 9.26 9.42 9.23
105/11/03 9.30 9.30 9.20


As you can see that pandas treat the date string not a part of the data and shift the index to one column left which cause the last column to be
NaN
.

I have read the pandas document on read_csv() and found it can parse date with
parse_dates
,
keep_date_col
parameters, but is there any way to NOT parse date as it is doing now?

Answer

This seems to work well:

import pandas as pd
df = pd.read_csv("test.csv", skiprows=[0], usecols=[0,3,4,5], index_col=False)

df
#        Date     T3      T4      T5
#0  105/11/01   9.30    9.36    9.27
#1  105/11/02   9.26    9.42    9.23
#2  105/11/03   9.30    9.30    9.20

Also this from the help docs:

index_col : int or sequence or False, default None
    Column to use as the row labels of the DataFrame. If a sequence is given, a
    MultiIndex is used. If you have a malformed file with delimiters at the end
    of each line, you might consider index_col=False to force pandas to _not_
    use the first column as the index (row names)