chris zeng chris zeng - 22 days ago 11
Python Question

python regex remove all consecutive duplicates EXCEPT www

Basically, I have strings like this

wwwccccheapflightscom
aaaamypage
wwwregularexpressions


Right now I have this, which removes all consecutive dups of length greater than 3.

re.sub(r"(\w)\1{2,}",'', string)


But I don't want it to remove "www" and and I want to KEEP the first occurance of the consecutive digits So based on the example, I want

wwwcheapflightscom
amypage
wwwregularexpressions

Answer

Add Negative Lookahead for www e.g. (?!www) at the beginning of your code:

(?!www)(\w)\1{2,}

Demo: https://regex101.com/r/kXBAgV/1

If you want to keep the the first occurrence, substitute with \1 as suggested by @bobblebubble

bobble's Demo: https://www.regex101.com/r/4bjQlu/1

.

Alternatively, you can use Positive Lookbehind (?<=).

Note: This will not work in Python, but it will work in PHP,

(?<=(\w)|(www\w))(?:\w)\1{2,}

Demo: https://regex101.com/r/kXBAgV/3

Comments