c_david c_david - 24 days ago 6
Python Question

Fastest file format for read/write operations with Pandas and/or Numpy

I've been working for a while with very large DataFrames and I've been using the csv format to store input data and results. I've noticed that a lot of time goes into reading and writing these files which, for example, dramatically slows down batch processing of data. I was wondering if the file format itself is of relevance. Is there a
preferred file format for faster reading/writing Pandas DataFrames and/or Numpy arrays?

Answer

Use HDF5. Beats writing flat files hands down. And you can query. Docs are here

Here's a perf comparison vs SQL. Updated to show SQL/HDF_fixed/HDF_table/CSV write and read perfs.

Docs now include a performance section:

See here