Geet Geet - 3 months ago 19
Python Question

How to resolve memory issue of pandas while reading big csv files

I have a 100GB csv file with millions of rows. I need to read, say, 10,000 rows at a time in pandas dataframe and write that to the SQL server in chunks.

I used chunksize as well as iteartor as suggested on http://pandas-docs.github.io/pandas-docs-travis/io.html#iterating-through-files-chunk-by-chunk, and have gone through many similar questions,but I am still getting the out of memory error.

Can you suggest a code to read very big csv files in pandas dataframe iteratively?

Answer

Demo:

for chunk in pd.read_csv(filename, chunksize=10**5):
    chunk.to_sql('table_name', conn, if_exists='append')

where conn is a SQLAlchemy engine (created by sqlalchemy.create_engine(...))