TheBandit TheBandit - 2 months ago 14
Python Question

Regex multiple matches in a search

I am using search and then group to extract a specific parts of a string. The problem I have, however, is that it only finds the first time it occurs. Which is correct because that is how search works.

I need to to find every instance of where it occurs, but if I use findall it creates an array which is not what I want and I can't get group() to work with it so it would take a lot of extra steps. Is there another way to do this?

Here is the code I have:

for num, line in enumerate(file, 1):
if check in line:
print 'href at line', num
reg = re.compile('href="(.*?)"|href=\'(.*?)\'')
link = re.search(reg, line)
link = link.group(1)
print 'url:', link


I only get the first url in the line.

Answer

Use re.finditer and loop over the result; finditer returns each match object one at a time, not just the first hit.

# Move compile outside the loop; the whole point of compiling is to do the work once
# and reuse the compiled object over and over
reg = re.compile('href="(.*?)"|href=\'(.*?)\'')
for num, line in enumerate(file, 1):
    if check in line:
        print 'href at line', num
        for link in reg.finditer(line):
            print 'url:', link.group(1)
Comments