kdragger kdragger -3 years ago 122
Python Question

How to use pandas extract method for multiple columns

I have a column in my dataframe that has price data but puts it all together as "price x size (num_orders)". For example, the column (as read in from CSV file) is:
127.3 x 13 (1)
I came across the Pandas extract method and I have it working, but I am only able to make it work for creating one column and a time. So I have it working, but I want to know how to do it all in one statement as I think that it will, at least, give me a bit of Pandas instruction.
This works:

df['price']=df.Level1.str.extract('(\d*\.\d*) x', expand=False)
df['size']=df.Level1.str.extract(' x (\d*)', expand=False)
df['num_orders'] = df.Level1.str.extract(' \((\d*)\)$', expand=False)


but this does NOT:

gcq7['price'], gcq7['size'], gcq7['num_orders'] =
gcq7.Level1.str.extract('(?P<price>\d*\.\d*) x (?P<size>\d*) \((?P<num_orders>\d*)\)$', expand=False)


I just get three columns that are titled correctly (price, size, num_orders) but have their column names as values all the way down, i.e., the 'price' column has "price" as the value for every row.

Help appreciated.

Edit:
consider the following column with index as input:

Date_Timestamp
2017-06-01 00:00:00.000 127.06000000 x 8 (0)
2017-06-01 00:00:00.000 127.38000000 x 8 (0)


And desired output:

price size num_orders
Date_Timestamp
2017-06-01 00:00:00.000 127.06000000 8 0
2017-06-01 00:00:00.000 127.38000000 8 0


Edited per suggestion and for future clarity.

Answer Source

IIUC you were almost there:

pat = r'(?P<price>\d*\.\d*)\s*x\s*(?P<size>\d*)\s*\((?P<num_orders>\d*)\)$'
gcq7[['price','size','num_orders']] = gcq7.Level1.str.extract(pat, expand=True)
# NOTE:                                                                   ^^^^
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download