I have a dataframe "DF" with with 500,000 rows. Here are the data types per column:
13 with open(path, 'wb') as f:
---> 14 pkl.dump(obj, f, protocol=pkl.HIGHEST_PROTOCOL)
OSError: [Errno 22] Invalid argument
with open('my_filename.pkl', 'wb') as f:
Probably not the answer you were hoping for but this is what I did......
Split the dataframe into smaller chunks using np.array_split (although numpy functions are not guaranteed to work, it does now, although there used to be a bug for it).
Then pickle the smaller dataframes.
When you unpickle them use pandas.append or pandas.concat to glue everything back together.
I agree it is a fudge and suboptimal. If anyone can suggest a "proper" answer I'd be interested in seeing it, but I think it as simple as dataframes are not supposed to get above a certain size.