kdragger kdragger -3 years ago 122
Python Question

How to use pandas extract method for multiple columns

I have a column in my dataframe that has price data but puts it all together as "price x size (num_orders)". For example, the column (as read in from CSV file) is:
127.3 x 13 (1)
I came across the Pandas extract method and I have it working, but I am only able to make it work for creating one column and a time. So I have it working, but I want to know how to do it all in one statement as I think that it will, at least, give me a bit of Pandas instruction.
This works:

df['price']=df.Level1.str.extract('(\d*\.\d*) x', expand=False)
df['size']=df.Level1.str.extract(' x (\d*)', expand=False)
df['num_orders'] = df.Level1.str.extract(' \((\d*)\)$', expand=False)

but this does NOT:

gcq7['price'], gcq7['size'], gcq7['num_orders'] =
gcq7.Level1.str.extract('(?P<price>\d*\.\d*) x (?P<size>\d*) \((?P<num_orders>\d*)\)$', expand=False)

I just get three columns that are titled correctly (price, size, num_orders) but have their column names as values all the way down, i.e., the 'price' column has "price" as the value for every row.

Help appreciated.

consider the following column with index as input:

2017-06-01 00:00:00.000 127.06000000 x 8 (0)
2017-06-01 00:00:00.000 127.38000000 x 8 (0)

And desired output:

price size num_orders
2017-06-01 00:00:00.000 127.06000000 8 0
2017-06-01 00:00:00.000 127.38000000 8 0

Edited per suggestion and for future clarity.

Answer Source

IIUC you were almost there:

pat = r'(?P<price>\d*\.\d*)\s*x\s*(?P<size>\d*)\s*\((?P<num_orders>\d*)\)$'
gcq7[['price','size','num_orders']] = gcq7.Level1.str.extract(pat, expand=True)
# NOTE:                                                                   ^^^^
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download