yusuke0426 yusuke0426 - 3 months ago 58
Python Question

Python How to extract specified string within [ ] brackets in pandas dataframe and create a new column with boolean values

I'm new to programming and would appreciate any of your insights!

I have a data frame like this.

df;

info Price
0 [100:Sailing] $100
1 [150:Boating, 100:Sailing] $200
2 [200:Surfing] $300


I would like to create new columns with activity names based on information in info column and add 1 in the new column if there is a corresponding name in info column. It is going to look like dataframe below.

Price Sailing Boating Surfing
0 $100 1 0 0
1 $200 1 1 0
2 $300 0 0 1


I tried a code blow but did not work..(eventhough this approach works in other columns)

df1 = df.info.str.extract(r'(Boating|Sailing|Surfing)',expand=False)
df2 = pd.concat([df,pd.get_dummies(df1).astype(int)],axis=1)


I have over 10 thousands of data like this so idealy I would like to write a code which automatically extract specified string (like Surfing) in info column, create a new column with the activity name and return 1 or 0 as shown above. I thought that maybe brackets in the data or data type in the dataframe are causing the problem, but I am not sure how to tackle this..

Answer

I assumed the format of the values in the info column is like a Python list.

df1 = df['info'].str[1:-1].str.replace(' ', '').str.get_dummies(',')
df1.rename(columns=lambda x: x.rsplit(':')[-1], inplace=True)
df2 = pd.concat([df, df1.astype(int)], axis=1)

df2
Out: 
                         info Price  Sailing  Boating  Surfing
0               [100:Sailing]  $100        1        0        0
1  [150:Boating, 100:Sailing]  $200        1        1        0
2               [200:Surfing]  $300        0        0        1