user3776749 user3776749 - 1 year ago 44
Python Question

Python Regular Expressions: making multiple different substitutions in a single pass using Groups

I'm tasked with taking a string, finding all instances of two different types of matches in that string, and performing a similar-but-different replacement on each match of each type, all using a single RegEx and a single pass through

re.sub()


Specifically I'm looking for any
<
or
<=
and replacing them with
>
and
>=
respectively. Each comparison operator in need of replacement is between two words as defined by
\w*
and zero or more spaces
\s*
on either side.

I have found a regular expression that finds all necessary matches and lumps them into useful groups:

((\b\w*(\s*<\s*)\w*\b)|(\b\w*(\s*<=\s*)\w*\b))+


This will parse the string such that all comparisons that meet the search criteria are matched, and that all
<
will be in match group
\3
and all
<=
will be in match group
\5


My question is this: Is there a way to replace all
\3
with
' > '
and all
\5
with
' >= '
in a single call to
re.sub()
? I've read through the documentation for the
sub
method in python
re
but haven't been able to find a way, perhaps due to my limited familiarity with the syntax and behavior of the whole system.

I am allowed and expected to compile the regex separately before the substitution and so the final set up will look something like this:

r1 = re.compile(r"((\b\w*(\s*<\s*)\w*\b)|(\b\w*(\s*<=\s*)\w*\b))+")
subStr = r" ??? "

r1.sub( ???, subStr ??? )


Here is some example input/output:

input string :


"v1 < v2 v3 <= v4 v5 > v6 v7 >= v8"


running the substitution would produce:


"v1 > v2 v3 >= v4 v5 > v6 v7 >= v8"


plugging my pattern and the input string into https://regex101.com/ for python, will show how my pattern matches the input string in the way I described.

Answer Source

You only have to make the = optional and to capture parts around the <:

re.sub(r'\b(?<=\w)(\s*)<(=?\s*\w)', r'\1>\2', s)

for efficiency reasons I started the pattern with the word boundary \b, the following lookbehind (?<=\w) ensures there's at least one word character.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download