Umair Umair - 10 months ago 42
Python Question

Python re.sub not working as expected

I have this HTML

b>Source: </b> <a href=\'http: //\'>text here</a><br><p class="normal">Creator: R.A. Fisher
<br><br>Donor: Namehere <b>\'@\'</b></u>)</p>

I want to remove multiple
from this using Regex

I am using this
_str = re.sub('<br>\s*','<br>',_str)

But it returns string as it was, with no change at all.

If I use same regex but specify a different replacing character then it works, this
_str = re.sub('<br>\s*','',_str)

Answer Source

You're only stripping off spaces following <br> with that. You can instead use a positive lookahead to remove all <br>s that have another <br> immediately following:

re.sub(r'<br>(?=<br>)', '', _str)

You may handle inter <br> spaces with:

re.sub(r'<br>(?=\s*<br>)', '', _str)