snapcrack snapcrack - 1 year ago 20
Python Question

Function now prints every letter instead of every word

I have data that looks like this:

owned category weight mechanics_split
28156 Environmental, Medical 2.8023 [Action Point Allowance System, Co-operative P...
9269 Card Game, Civilization, Economic 4.3073 [Action Point Allowance System, Auction/Biddin...
36707 Modern Warfare, Political, Wargame 3.5293 [Area Control / Area Influence, Campaign / Bat...


and used this function (taken from the generous answer in this question):

def owned_nums(games):
for row in games.iterrows():
owned_value = row[1]['owned']
mechanics = row[1]['mechanics_split']
for type_string in mechanics:
game_types.loc[type_string, ['owned']] += owned_value


to iterate over the values in the dataframe and put new values in a new dataframe called
game_types
. It worked great. In fact, it still works great; that notebook is open, and if I change the last line of the function to just
print (type_string)
, it prints:

Action Point Allowance System
Co-operative Play
Hand Management
Point to Point Movement
Set Collection
Trading
Variable Player Powers
Action Point Allowance System...


Okay, perfect. So, I saved my data as a csv, opened a new notebook, opened the csv with the columns with the split strings, copied and pasted the exact same function, and when I print
type_string
, I now get:

[
'
A
c
t
i
o
n

P
o
i
n
t

A
l
l
o
w


The only thing I could notice is that the original lists were quote-less, with
[Action Point Allowance System, Co-operative...]
etc., and the new dataframe opened from the new csv was rendered as
['Action Point Allowance System', 'Co-operative...']
, with quotes. I used
str.replace("'","")
which got rid of the quotes, but it's still returning every letter. I've tried experimenting with the escapechars in to_csv but to no avail. Very confused as to what setting I need to tweak.

Thanks very much for any help.

DSM DSM
Answer Source

The only way the code

    mechanics =  row[1]['mechanics_split']
    for type_string in mechanics:
        game_types.loc[type_string, ['owned']] += owned_value

can have worked is if your mechanics_split column contained not a string but an iterable containing strings.

Storing non-scalar data in Series is not well-supported, and while it's sometimes useful (though slow) as an intermediate step, it's not supposed to be something you do regularly. Basically what you're doing is

>>> df = pd.DataFrame({"A": [["x","y"],["z"]]})
>>> df.to_csv("a.csv")
>>> !cat a.csv
,A
0,"['x', 'y']"
1,['z']

after which you have

>>> df2 = pd.read_csv("a.csv", index_col=0)
>>> df2
            A
0  ['x', 'y']
1       ['z']
>>> df.A.values
array([['x', 'y'], ['z']], dtype=object)
>>> df2.A.values
array(["['x', 'y']", "['z']"], dtype=object)
>>> type(df.A.iloc[0])
<class 'list'>
>>> type(df2.A.iloc[0])
<class 'str'>

and you notice that what was originally a Series containing lists of strings is now a Series containing only strings. Which makes sense, if you think about it, because CSVs never claimed to be type-preserving.

If you insist on using a frame like this, you should manually encode and decode your lists via some representation (e.g. JSON strings) on reading and writing. I'm too lazy to confirm what pandas does to str-ify lists, but you might be able to get away with applying ast.literal_eval to the resulting strings to turn them back into lists.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download