brandog brandog - 1 year ago 40
Python Question

Create dataframe of all rows AFTER varying amount of header data Python Pandas

I have dataframes with a varying amount of header data.
I need to remove the header data, (ie. create a new dataframe containing only the data that comes after this header)

I have used the following code to find the row where the header data ends.

df = xlsx_file.parse('ActualSheet',header= None)
value_list = ['var1','var2']
df_Header = df[df[0].isin(value_list) & (df[1].isin(value_list))]

The above code works and creates a dataframe of the final row of header data.

I am having trouble creating a new dataframe from the original data that only includes the rows AFTER this "df_Header" row.

Any help is appreciated, I know the answer is already out there but I could not find it.

Answer Source

IIUC you can do it this way:

df = df[df_Header.index.max():]


df = df[~(df[0].isin(value_list) & (df[1].isin(value_list)))] 

PS you may also want to make use of header and / or skiprows parameters of the read_excel() function