Amelio Vazquez-Reina Amelio Vazquez-Reina - 25 days ago 10
Python Question

Mixed types when reading csv files. Causes, fixes and consequences

What exactly happens when Pandas issues this warning? Should I worry about it?

In [1]: read_csv(path_to_my_file)
/Users/josh/anaconda/envs/py3k/lib/python3.3/site-packages/pandas/io/parsers.py:1139:
DtypeWarning: Columns (4,13,29,51,56,57,58,63,87,96) have mixed types. Specify dtype option on import or set low_memory=False.

data = self._reader.read(nrows)


I assume that this means that Pandas is unable to infer the type from values on those columns. But if that is the case, what type does Pandas end up using for those columns?

Also, can the type always be recovered after the fact? (after getting the warning), or are there cases where I may not be able to recover the original info correctly, and I should pre-specify the type?

Finally, how exactly does
low_memory=False
fix the problem?

Answer

low_memory is apparently kind of deprecated, so I wouldn't bother with it.

The warning means that some of the values in a column have one dtype (e.g. str), and some have a different dtype (e.g. float). I believe pandas uses the lowest common super type, which in the example I used would be object.

You should check your data, or post some of it here. In particular, look for missing values or inconsistently formatted int/float values. If you are certain your data is correct, then use the dtypes parameter to help pandas out.