PratPor PratPor - 7 months ago 21
Python Question

regex - merge repeated consecutive words preserving last space

I have a string like this

{{TAG}} {{TAG}}{{TAG}} {{TAG}} some other text. {{TAG}} {{TAG}}

and I am trying to merge multiple consecutive occurrences of
into one. So I have this regex
re.sub(r'(({{TAG}})\s*)+', "{{TAG}}", text)
which works fine to remove multiple occurrences and gives me this

{{TAG}}some other text. {{TAG}}

But its taking one extra space at the end, which I am trying to avoid. So that I get

{{TAG}} some other text. {{TAG}}

Found a similar question here, but that didn't solve my problem. Any suggestions to improve my regex or any other alternative in python?


One simple way is that instead of + you can split the regex into two as

>>> re.sub(r'(?:{{TAG}}\s*)*{{TAG}}', r'{{TAG}}', string)
'{{TAG}} some other text. {{TAG}}'
  • (?:{{TAG}}\s*)* Matches zero or more {{TAG}} with space at the end.

  • {{TAG}} Match the last {{TAG}} without any space.

You can also solve this using a positive look ahead

>>> re.sub(r'{{TAG}}\s*(?={{TAG}})', r'', string)
'{{TAG}} some other text. {{TAG}}'
  • {{TAG}}\s* Matches one {{TAG}} followed by space.

  • (?={{TAG}} Positive look ahead. Checks if the {{TAG}} matched in the above point is followed by another {{TAG}}