Rodrigo Laguna Rodrigo Laguna - 12 days ago 7
Python Question

match same pattern at begin and end twice

I need to change words in upper case for a tag

'###'
. Let's suppouse a charset like this in all the text:
[a-zA-Z\s]


I'm doing this:

re.sub(r'(^|\s)([A-Z]+)(\s|$)', r'\1###\3', 'Hello PYTHON WORLD')


but instead of getting
'Hello ### ###'
, it returns
'Hello ### WORLD'
.

The problem is that re.sub matches all occurrences, however, groups 1 and 3 are equals, but re.sub isn't using it twice to match PYTHON and WORLD.

How do I solve it? I'm using python 3.

equals: almost equals, they differ in
^
and
$
in case of begin/end, but this isn't the problem.

Answer

To replace all upper-case words with ### use the following approach:

s = 'Hello PYTHON WORLD'
replaced = re.sub(r'\b([A-Z]+)\b', r'###', s)
print(replaced)

The output:

Hello ### ###

\b is defined as the boundary between a \w and a \W character (or vice versa), or between \w and the beginning/end of the string