O.rka O.rka - 14 days ago 8
Python Question

Find all overlapping patterns infinite loop, using Python 3 and dynamic programming

I was looking at:
Python regex find all overlapping matches? and the

re.finditer
didn't work for me. I don't want to download yet another module (i.e.
regex
) to replace the built-in
re
. I thought I could write my own but my understanding of
while loops
is limited.

I'm trying to make a search wrapper that finds all patterns even if they are overlapping (one thing the current
findall
does not do in
re
).

I've also never tried to program something like this so instead of using a built-in module for this, I would like to try and construct my own function so I can learn how to program dynamically.

def findall_positions(pattern, sequence):
positions = list()
cont = True

while cont == True:
match = re.search(pattern, sequence)
if match is not None:
positions.append(match.start())
sequence = sequence[positions[-1]:]
if match is None:
cont = False
return positions

findall_positions("AB","ABBABBABBBA")


My logic was to do
re.search
and if there is a hit, append the start to
positions
, then get next stretch of the string after that first
re.search match
and iterate through this until
match is None
. However, I'm getting an infinite loop. How can I restructure this to get the correct results? I'm looking for an output of
[0,3,6]

Answer

The issue with above approach is that the start of first match is at offset 0 and thus on the first iteration positions is [0]. Then sequence = sequence[positions[-1]:] naturally results to original string and thus you have infinite loop.

You could have a separate variable to keep track of the offset and construct new string every time you do re.search. If match is found then adjust the position accordingly:

import re

def findall_positions(pattern, sequence):
    positions = list()
    cont = True
    offset = 0

    while cont == True:
        match = re.search(pattern, sequence[offset:])
        if match is not None:
            positions.append(match.start() + offset)
            offset = positions[-1] + 1
        if match is None:
            cont = False
    return positions

print(findall_positions("AB","ABBABBABBBA"))

Output:

[0, 3, 6]