Kingle Kingle - 2 months ago 13
Python Question

pd.read_csv ignores columns that don't have headers

I have a .csv file that is generated by a third-party program. The data in the file is in the following format:

%m/%d/%Y 49.78 85 6 15
03/01/1984 6.63368 82 7 9.8 34.29056405 2.79984079 2.110346498 0.014652412 2.304545521 0.004732732
03/02/1984 6.53368 68 0 0.2 44.61471002 3.21623666 2.990408898 0.077444779 2.793385466 0.02661873
03/03/1984 4.388344 55 6 0 61.14463457 3.637231063 3.484310818 0.593098236 3.224973641 0.214360796


There are 5 column headers (row 1 in excel, columns A-E) but 11 columns in total (row 1 columns F-K are empty, rows 2-N contain float values for columns A-K)

I was not sure how to paste the .csv lines in so they are easily replicable, sorry for that. An image of the excel sheet is shown here: Excel sheet to read in

when I use the following code:

FWInds=pd.read_csv("path.csv")


or:

FWInds=pd.read_csv("path.csv", header=None)


the resulting dataframe FWInds does not contain the last 6 columns - it only contains the columns with headers (columns A-E from excel, column A as index values).

FWIDat.shape
Out[48]: (245, 4)


Ultimately the last 6 columns are the only ones I even want to read in.

I also tried:

FWInds=pd.read_csv('path,csv', header=None, index_col=False)


but got the following error

CParserError: Error tokenizing data. C error: Expected 5 fields in line 2, saw 11


I also tried to ignore the first row since the column titles are unimportant:

FWInds=pd.read_csv('path.csv', header=None, skiprows=0)


but get the same error.

Also no luck with the "usecols" parameter, it doesn't seem to understand that I'm referring to the column numbers (not names), unless I'm doing it wrong:

FWInds=pd.read_csv('path.csv', header=None, usecols=[5,6,7,8,9,10])


Any tips? I'm sure it's an easy fix but I'm very new to python.

Jan Jan
Answer

There are a couple of parameters that can be passed to pd.read_csv():

import pandas as pd
colnames = list('ABCDEFGHIKL')
df = pd.read_csv('test.csv', sep='\t', names=colnames)

With this, I can actually import your data quite fine (and it is accessible via eg df['K'] afterwards).