bmello bmello - 8 months ago 53
Python Question

How to skip an unknown number of empty lines before header on pandas.read_csv?

I want to read a dataframe from a csv file where the header is not in the first line. For example:

In [1]: import pandas as pd

In [2]: import io

In [3]: temp=u"""#Comment 1
...: #Comment 2
...: #The previous line is empty
...: Header1|Header2|Header3
...: 1|2|3
...: 4|5|6
...: 7|8|9"""

In [4]: df = pd.read_csv(io.StringIO(temp), sep="|", comment="#",
...: skiprows=4).dropna()

In [5]: df
Header1 Header2 Header3
0 1 2 3
1 4 5 6
2 7 8 9

[3 rows x 3 columns]

The problem with the above code is that I don't now how many lines will exist before the header, therefore, I cannot use
as I did here.

I aware I can iterate through the file, as in the question Read pandas dataframe from csv beginning with non-fix header.

What I am looking for is a simpler solution, like making
disregard any empty line and taking the first non-empty line as the header.


You need to set skip_blank_lines=True

df = pd.read_csv(io.StringIO(temp), sep="|", comment="#", skip_blank_lines=True).dropna()