bobleponge bobleponge - 6 days ago 7
Python Question

DataFrame.to_dict() is not always invertible

My main point is that:

assert_frame_equal(DataFrame.from_dict(df.to_dict()), df)


fails in some cases. I would love to provide with a reproducible example but (i) the data would be too big to post, and (ii) for this I would need to provide with a DataFrame serialized (which is precisely where this fails...)

Is this a known issue? Am I doing something wrong?

Answer

One reason this can fail is that df.to_dict() creates a Python dictionary. The keys of dictionaries are not guaranteed to be in any particular order.

The DataFrame's column names are mapped to the dictionary keys and, as per this question, order of columns matters when testing DataFrame equality.

This fact is easy to check:

>>> df = pd.DataFrame(columns=['a', 'c', 'b'])
>>> pd.util.testing.assert_frame_equal(df, pd.DataFrame(df.to_dict()))
# AssertionError

There are a number of keyword arguments you could pass in to the test to specify the criteria you want to check or ignore, including check_names=False (which is True by default).

Comments