student student - 1 month ago 19
Python Question

How to "expand" a pandas dataframe from string/character?

I generated a pandas dataframe from a list of lists:

In:

lis = [['baby ferrets\ntype: mamal\n»age:2\n»food: Renal'],['dog\ntype: mamal\n»age: 3 months\n»food: dog food'], ['cat\ntype: mamal\n»age: 2\n»food: cat food'], ['bobcat (exotic pet)\ntype: mamal\n»age: 1\n»food: meat'], ['iguana\ntype: reptile\n»age: 2\n»food: crickets']]

df = pd.DataFrame(lis)
df


Out:

0
0 baby ferrets\ntype: mamal\n»age:2\n»food: Renal
1 dog\ntype: mamal\n»age: 3 months\n»food: dog food
2 cat\ntype: mamal\n»age: 2\n»food: cat food
3 bobcat (exotic pet)\ntype: mamal\n»age: 1\n»food: meat
4 iguana\ntype: reptile\n»age: 2\n»food: crickets


How can I transform the previous dataframe into (*):

pet, type, age, food
0 baby ferrets, mammal, 2, Renal
1 dog, mammal, 3 months, dog food
2 cat, mammal, 2, cat food
3 bobcat (exotic pet), mammal, 1, meat
4 iguana, reptile, 2, crickets


When I created a the pandas dataframe I tried to do:

df = pd.DataFrame(lis, sep= '\n')


I also tried to:

df['newcol'] = lis['pet'].str.extract('([A-Z]\w{0,})', expand=True)
df


However, I am not matching all the elements. Is it possible to get (*) format with pandas?.

Answer

This should work for parsing your column after it is loaded.

def parse_col(r):
    return pd.Series(data=[i.split(':')[-1] for i in r[0].split('\n')], index=['name', 'type', 'age', 'food'])

df.apply(parse_col, axis=1)

    name    type    age food
0   baby ferrets    mamal   2   Renal
1   dog mamal   3 months    dog food
2   cat mamal   2   cat food
3   bobcat (exotic pet) mamal   1   meat
4   iguana  reptile 2   crickets

The parse_col function above can also be modified to parse the list prior to loading into a DataFrame.