Simple runner Simple runner - 1 month ago 7
Python Question

Remove only double letters sequences in word with best speed with python

Very popular task for finding/replacing double letters in a string. But exist solution, where you can make remove double letters through few steps. For example, we have string

"skalallapennndraaa"
, and after replacing double letters we need to get in output
"skalpendra"
. I tried solution with

re.sub(r'([a-z])\1+', r'\1', "skalallapennndraaa")


, but this don't remove all double letters in a string(result-
"skalalapendra"
). If I use
r''
as second parameter, I got a closely related result
"skalaapendr"
, but I still can't find right regular expression for replacement parameter. Any ideas?

Answer

You can use this double replacement:

>>> s = 'skalallapennndraaa'
>>> print re.sub(r'([a-z])\1', '', re.sub(r'([a-z])([a-z])\2\1', '', s))
skalpendra

([a-z])([a-z])\2\1 will remove alla type of cases and ([a-z])\1 will remove remaining double letters.


Update: Based on comments below I realize a loop based approach is best. Here it is:

>>> s = 'nballabnz'
>>> while re.search(r'([a-z])\1', s):
...     s = re.sub(r'([a-z])\1', '', s)
...
>>> print s
z