I'm playing around with group backreferences in Python's Regex to try to understand them and I'm not having much luck.
import re
leftQuotes = re.compile("((\"|\“)([\w|\d]))")
rightQuotes = re.compile("(([\w|\d])(\"|\”))")
s = "This is “problematic”"
s = re.sub(leftQuotes, r'‘\3', s)
s = re.sub(rightQuotes, r'’\3', s)
print(s)
This is ‘problemati’”
re.sub()
\1: ‘problemati’c”
\2: ‘problemati’c
\3: ‘problemati’”
To fix your code, replace the second sub
with:
s = re.sub(rightQuotes, r'\2’', s)
should work, since the word character in the second pattern comes as the second capture group and it should come before the single quote replacement as well.
Besides, you don't really need capture groups here, use look around would be cleaner, (though not critical quoting the string with single quote can save you some typing as @CasimiretHippolyte's comment):
import re
leftQuotes = re.compile('(?:"|“)(?=\w)')
rightQuotes = re.compile('(?<=\w)(?:"|”)')
s = "This is “problematic”"
s = re.sub(leftQuotes, r'‘', s)
s = re.sub(rightQuotes, r'’', s)
s
# 'This is ‘problematic’'
Also since \w
includes \d
, [\w|\d]
can be replaced by \w
.