David Metcalfe David Metcalfe - 2 years ago 150
Python Question

Trouble with Regex backreference in Python

I'm playing around with group backreferences in Python's Regex to try to understand them and I'm not having much luck.

import re

leftQuotes = re.compile("((\"|\“)([\w|\d]))")
rightQuotes = re.compile("(([\w|\d])(\"|\”))")

s = "This is “problematic”"

s = re.sub(leftQuotes, r'‘\3', s)
s = re.sub(rightQuotes, r'’\3', s)

print(s)


Output:

This is ‘problemati’”


In the first
re.sub()
, I managed to successfully replace the left double quotation mark with a single left quotation mark while keeping the matching character (in this case, a "p"). But the right side doesn't behave in the same way, regardless of the group backreference (1, 2, 3).

Results of backreferences:

\1: ‘problemati’c”
\2: ‘problemati’c
\3: ‘problemati’”

Answer Source

To fix your code, replace the second sub with:

s = re.sub(rightQuotes, r'\2’', s)

should work, since the word character in the second pattern comes as the second capture group and it should come before the single quote replacement as well.


Besides, you don't really need capture groups here, use look around would be cleaner, (though not critical quoting the string with single quote can save you some typing as @CasimiretHippolyte's comment):

import re
​
leftQuotes = re.compile('(?:"|“)(?=\w)')
rightQuotes = re.compile('(?<=\w)(?:"|”)')
​
s = "This is “problematic”"
​
s = re.sub(leftQuotes, r'‘', s)
s = re.sub(rightQuotes, r'’', s)
​
s
# 'This is ‘problematic’'

Also since \w includes \d, [\w|\d] can be replaced by \w.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download