Miyashita Hikaru Miyashita Hikaru - 2 months ago 28
Python Question

how to read strange csv file in pandas

I would like to read sample csv file shown in below

--------------
|A|B|C|
--------------
|1|2|3|
--------------
|4|5|6|
--------------
|7|8|9|
--------------


I tried

pd.read_csv("sample.csv",sep="|")


but it didnt work well.
how can I read this csv ?

Answer

You can add parameter comment to read_csv and then remove columns with NaN by dropna:

import pandas as pd
import io

temp=u"""--------------
|A|B|C|
--------------
|1|2|3|
--------------
|4|5|6|
--------------
|7|8|9|
--------------"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep="|", comment='-').dropna(axis=1, how='all')

print (df)
   A  B  C
0  1  2  3
1  4  5  6
2  7  8  9

More general solution:

import pandas as pd
import io

temp=u"""--------------
|A|B|C|
--------------
|1|2|3|
--------------
|4|5|6|
--------------
|7|8|9|
--------------"""
#after testing replace io.StringIO(temp) to filename
#separator is char which is NOT in csv
df = pd.read_csv(io.StringIO(temp), sep="^", comment='-')

#remove first and last | in data and in column names
df.iloc[:,0] = df.iloc[:,0].str.strip('|') 
df.columns = df.columns.str.strip('|')
#split column names
cols = df.columns.str.split('|')[0]
#split data
df = df.iloc[:,0].str.split('|', expand=True)
df.columns = cols
print (df)
   A  B  C
0  1  2  3
1  4  5  6
2  7  8  9