sedeh sedeh - 4 months ago 15
Python Question

combining two regex - lambda functions into one

I'd like to combine two regex functions to clean up my data frame. Assume I've the following dataframe.

import pandas as pd
time = ["09:00", "10:00", "11:00", "12:00", "13:00", "33:00"]
result = ["+52", "+62", "+44 - 10a10", "+44", "+30 - $1200", "110"]
data = pd.DataFrame({'time' : time, 'result' : result})


data
looks like this.

result time
0 +52 09:00
1 +62 10:00
2 +44 - 10a10 11:00
3 +44 12:00
4 +30 - $1200 13:00
5 110 33:00


First, I want to remove the
+
sign. Second, I want to remove the
-
sign and everything after it. I can accomplish that with two functions.

import re
data['result'] = data['result'].map(lambda x: re.sub('\+', '', x))
data['result'] = data['result'].map(lambda x: re.sub('\-.*', '', x))


data
now looks like this.

result time
0 52 09:00
1 62 10:00
2 44 11:00
3 44 12:00
4 30 13:00
5 110 33:00


Is there a way to do all the replacements in one step?

Answer

You can use the or (|) in the RegEx and do both the operations in one shot, like this

>>> import re
>>> re.sub(r'\+|-.*', '', 'a+b+c-d+f-g')
'abc'

So, in your case, the lambda function would be

data['result'] = data['result'].map(lambda x: re.sub('\+|-.*', '', x))
Comments