FunnyChef FunnyChef - 1 month ago 8
Python Question

Why is my regex expression failing?

Thanks for taking the time to read this.

I'm using Python pandas to merge two datasets on a column named 'title'. Some of the data, in one dataset has additional characters in the title cells surrounded by parentheses which causes the merge to fail on these cells. I'm trying to remove the parentheses and the values they contain using the following however, the merge still misses the updated data.

Data sample, code and regex are below.

I'm assuming that the regex is incorrect - any thoughts?

import pandas as pd

data1 = pd.DataFrame({'id': ['a12bcde0'], 'title': ['company_a']})

data2 = pd.DataFrame({'serial_number': ['01a2b345','10ab2030'],'title':['company_a','company_a (123)']})

data2['title'].replace(regex=True,inplace=True,to_replace=r"\(.*\)",value=r'')

pd.merge(data1, data2, on='title')

Answer

You're forgetting the whitespace before the opening parentheses in your pattern: to_replace=r"\s\(.*\)"