Cadel Watson Cadel Watson - 1 month ago 8
Python Question

Python re.sub: replace part of matching string that contains an arbitrary number of capturing groups

I know that there are other questions which deal with the problem of replacing only part of a matching string using re.sub, but the answers revolve around referring back to capturing groups. My situation is a bit different:

I'm generating regexes like

'(?:i|æ|ʏ|ɞ).(?:i|æ|ʏ|ɞ)'
and
^.
in another part of the application. If I have the string
'abcd'
, and the pair
('b', 'c')
, I want to replace all instances of
b
where the regex matches at the period character (
.
).

For example, if I have the rule
'(?:x|y|z).(?:h|i|j)'
, and the desired change is
a
to
b
, the following should occur:

xah -> xbh
yai -> ybi
zaz -> zaz (no change)


I've tried using
re.sub
, replacing the
.
with my target in the search string and with my replacement in the replacement string, but this replaces the whole match in the target string, when in reality I only want to change a small part. My problem with using match groups and referring back to them in the replacement is that I don't know how many there will be, or what order they'll be in - there might not even be any - so I'm trying to find a flexible solution.

Any help is very appreciated! It's quite difficult to explain, so if further clarification is needed please ask :).

Answer

You could use "lookahead" and "lookbehind" assertions, like so:

import re

tests = (
    ('xah', 'xbh'),
    ('yai', 'ybi'),
    ('zaz', 'zaz'),
)

for test_in, test_out in tests:
    out = re.sub('(?<=x|y|z)a(?=h|i|j)', 'b', test_in)
    assert test_out == out