Christian S Christian S - 1 month ago 9
Python Question

Replacing single words with re.sub in python

Spoiler: Yes this is an assignment. It is solved, but for personal interest I want to know the below.

So at the moment working with a syntax marker for an assignment, in which we input a file, and using a dictionary of regexes, colour them (keywords) accordingly.

Having some issues, though.

for i in iterations:
pass


in this above line, using a regex
r'(\t*for.*in.*?:.?)'
will work, but it will colour the entire line. While that is allowed, I would really like for it to only mark
for/in
.

Trying with
r'(\bfor\b|\bin\b)'
is not being kind, nor
r'(for)'
, or r'(\sfor\s)'.

I read the whole code into one string and use re.sub() to replace all occurences with
colour + r'\1' + colour_end
where colour specifies colour sequences.

Answer

You may use capturing and backreferences:

^(\t*)(for\b)(.*)\b(in)\b(.*?:)

Replace with $1<color>$2</color>$3<color>$4</color>$5. See the regex demo.

Here, the expression is split into 5 subparts with (...) capturing groups. In the replacement pattern, those values captured are referred to with backreferences having $+n format where n is the ID of the capturing group inside the pattern.

If you have no chance to run 1 regex with multiple capturing groups, run two on end:

  • ^(\t*)for\b(?=.*\bin\b.*?:) --> $1<color>for</color> (see this demo)
  • ^(\t*for\b.*)\bin\b(?=.*?:) --> $1<color>in</color> (see another demo).

The single capturing group is around the part before the word, and the part after the word is not matched but checked with a positive lookahead.