Vinodini Natrajan Vinodini Natrajan - 6 days ago 5
Python Question

pandas failing with variable columns

my file is this


4 7 a a
s g 6 8 0 d
g 6 2 1 f 7 9
f g 3
1 2 4 6 8 9 0


I was using pandas to save it in form of pandas object. But I am getting the following error

pandas.parser.CParserError: Error tokenizing data. C error: Expected 6 fields in line 3, saw 8


The code I used was

file = pd.read_csv("a.txt",dtype = None,delimiter = " ")


Can anyone suggest an idea to include the file as such ?

Answer

Here's one way.

In [50]: !type temp.csv
4,7,a,a
s,g,6,8,0,d
g,6,2,1,f,7,9
f,g,3
1,2,4,6,8,9,0

Read the csv to list of lists and then convert to DataFrame.

In [51]: pd.DataFrame([line.strip().split(',') for line in open('temp.csv', 'r')])
Out[51]:
   0  1  2     3     4     5     6
0  4  7  a     a  None  None  None
1  s  g  6     8     0     d  None
2  g  6  2     1     f     7     9
3  f  g  3  None  None  None  None
4  1  2  4     6     8     9     0