FunnyChef FunnyChef - 1 year ago 87
Python Question

Why is my regex expression failing?

Thanks for taking the time to read this.

I'm using Python pandas to merge two datasets on a column named 'title'. Some of the data, in one dataset has additional characters in the title cells surrounded by parentheses which causes the merge to fail on these cells. I'm trying to remove the parentheses and the values they contain using the following however, the merge still misses the updated data.

Data sample, code and regex are below.

I'm assuming that the regex is incorrect - any thoughts?

import pandas as pd

data1 = pd.DataFrame({'id': ['a12bcde0'], 'title': ['company_a']})

data2 = pd.DataFrame({'serial_number': ['01a2b345','10ab2030'],'title':['company_a','company_a (123)']})


pd.merge(data1, data2, on='title')

Answer Source

You're forgetting the whitespace before the opening parentheses in your pattern: to_replace=r"\s\(.*\)"

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download