Gyan Veda Gyan Veda - 5 months ago 49
Python Question

Pandas DataFrame stored list as string: How to convert back to list?

I have an n-by-m Pandas DataFrame

df
defined as follows. (I know this is not the best way to do it. It makes sense for what I'm trying to do in my actual code, but that would be TMI for this post so just take my word that this approach works in my particular scenario.)

>>> df = DataFrame(columns=['col1'])
>>> df.append(Series([None]), ignore_index=True)
>>> df
Empty DataFrame
Columns: [col1]
Index: []


I stored lists in the cells of this DataFrame as follows.

>>> df['column1'][0] = [1.23, 2.34]
>>> df
col1
0 [1, 2]


For some reason, the DataFrame stored this list as a string instead of a list.

>>> df['column1'][0]
'[1.23, 2.34]'


I have 2 questions for you.


  1. Why does the DataFrame store a list as a string and is there a way around this behavior?

  2. If not, then is there a Pythonic way to convert this string into a list?






Solution

As other users have pointed out in answers and comments, this situation cannot be replicated easily. As it turns out, it is not the DataFrame itself that formats a list as a string. The DataFrame I was using had been saved and loaded from a CSV format. This format, rather than the DataFrame itself, converted the list from a string to a literal.

Thank you all for making me realize the cause of this strange behavior. And thank you, Alex Thornton, for teaching me about the
literal_eval
method, which seems super helpful and will certainly come in handy in the future!

Answer

As you pointed out, this can commonly happen when saving and loading pandas DataFrames as .csv files, which is a text format.

In your case this happened because list objects have a string representation, allowing them to be stored as .csv files. Loading the .csv will then yield that string representation.

If you want to store the actual objects, you should you use DataFrame.to_pickle() (note: objects must be picklable!).

To answer your second question, you can convert it back with ast.literal_eval:

>>> from ast import literal_eval
>>> literal_eval('[1.23, 2.34]')
[1.23, 2.34]
Comments