Cadel Watson Cadel Watson - 1 year ago 120
Python Question

Python re.sub: replace part of matching string that contains an arbitrary number of capturing groups

I know that there are other questions which deal with the problem of replacing only part of a matching string using re.sub, but the answers revolve around referring back to capturing groups. My situation is a bit different:

I'm generating regexes like

in another part of the application. If I have the string
, and the pair
('b', 'c')
, I want to replace all instances of
where the regex matches at the period character (

For example, if I have the rule
, and the desired change is
, the following should occur:

xah -> xbh
yai -> ybi
zaz -> zaz (no change)

I've tried using
, replacing the
with my target in the search string and with my replacement in the replacement string, but this replaces the whole match in the target string, when in reality I only want to change a small part. My problem with using match groups and referring back to them in the replacement is that I don't know how many there will be, or what order they'll be in - there might not even be any - so I'm trying to find a flexible solution.

Any help is very appreciated! It's quite difficult to explain, so if further clarification is needed please ask :).

Answer Source

You could use "lookahead" and "lookbehind" assertions, like so:

import re

tests = (
    ('xah', 'xbh'),
    ('yai', 'ybi'),
    ('zaz', 'zaz'),

for test_in, test_out in tests:
    out = re.sub('(?<=x|y|z)a(?=h|i|j)', 'b', test_in)
    assert test_out == out
