I have a file (example shown below) that has multiple CSV tables. This file is uploaded to a database. I would like to do some operations on this file. For that, I was thinking of using pandas to read each table into a separate dataframe using read_csv function. However, going through the documentation, I didn't see an option to specify a subset of lines to read/parse. Is this possible? If not, are there other alternatives?
if you want to parse only specific lines, you can use
nrows parameters as @Richard Telford mentioned in the comment:
df = pd.read_csv(filename, header=None, names=['col1','col2','col3'], skiprows=[0,1,5,16,57,58,59])
here is a small example for "buffer":
import io import pandas as pd data = """\ Name 0 JP2015121 1 US14822 2 US14358 3 JP2015539 4 JP2015156 """ df = pd.read_csv(io.StringIO(data), delim_whitespace=True, index_col=0) print(df)
the same without header:
data = """\ 0 JP2015121 1 US14822 2 US14358 3 JP2015539 4 JP2015156 """ df = pd.read_csv(io.StringIO(data), delim_whitespace=True, index_col=0, header=None, names=['Name'])