Paradoxis Paradoxis - 5 months ago 31
Python Question

Python re.compile curly quotes issue

I'm currently writing an application that uses a framework to match certain phrases, currently it is supposed to match the following regex pattern:

Say \"(.*)\"


However, I've notices that my users are complaining about the fact that their OS sometimes copies and pastes 'curly quotes' in, what ends up happening is that users provide the following sentence:

Say "Hello world!" <-- Matches
Say “Hello world!” <-- Doesn't match!


Is there any way I can tell Python's regular expressions to treat these curly quotes the same as regular quotes?

Answer

You could just include the curly quotes in the regex:

Say [\"“”](.*)[\"“”]

As something you can replicate in the Python repl, it's like this:

>>> import re
>>> test_str = r'"Hello"'
>>> reg = r'["“”](.*)["“”]'
>>> m = re.search(reg, test_str)
>>> m.group(1)
'Hello'
>>> test_str = r'“Hello world!”'
>>> m = re.search(reg, test_str)
>>> m.group(1)
'\x80\x9cHello world!\xe2\x80'
Comments