I am using Pandas library and Python.
I have an Excel file that has some heading information on the top of an Excel sheet which I do not need for data extraction.
But, the heading information could take longer rows, so it is unpredictable how long it could be.
So, my data extraction should start from where it says "ID"...
For this particular case, it starts from row 5, but it could change.
The image is shown on the bottom (I grayed out after row 5 for sensitive info).
How do I put this in logic (to skip heading and jump to row 5)?
The pattern should be, row heading starts from "ID, EMP_ID" etc.
with open('File.xls') as fp:
skip = next(filter(
lambda x: x.startswith('ID'),
df = pd.read_excel('File.xls', usercols=['ID', 'EMP_ID'], skiprows=skip)
You could manually check for the header line and then use
read_csvs keyword argument
with open('data.csv') as fp: skip = next(filter( lambda x: x.startswith('ID'), enumerate(fp) ))
Then skip the rows:
df = pandas.read_csv('data.csv', skiprows=skip)
Like that you can support pre-header sections of arbitrary length.
For Python 2:
import itertools as it with open('data.csv') as fp: skip = next(it.ifilter( lambda x: x.startswith('ID'), enumerate(fp) ))