warchest warchest - 7 months ago 9
Python Question

re.search() only matches the first occurance

I'm trying to match the pattern:

<--Header Title-->
some body text


The following only matches the first occurrence:

string1 = """<-- Option 1 -->
Nice text
<--Final stuff-->
Listing all
of
the
text
"""

regex = re.compile(r"<--([\w\s]+)-->([\s\S]*?)(?=\n<--|$)")
m = regex.search(string1)
print m.groups()


Which results in:

(' Option 1 ', '\nNice text')


However, it seems to work fine using pythex.

What am I doing wrong?

Answer

Re.search only matches the first occurrence within the string. You want finditer or findall.

Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

Finditer returns match objects for all locations within the target string, yielding an iterator, while findall returns the substrings for all matches.

>>> import re
>>> re.findall('a', 'ababababa')
['a', 'a', 'a', 'a', 'a']

>>> x = list(re.finditer('a', 'ababababa'))
>>> x
[<_sre.SRE_Match object; span=(0, 1), match='a'>,
 <_sre.SRE_Match object; span=(2, 3), match='a'>,
 <_sre.SRE_Match object; span=(4, 5), match='a'>,
 <_sre.SRE_Match object; span=(6, 7), match='a'>,
 <_sre.SRE_Match object; span=(8, 9), match='a'>]
>>> x[0].group()
'a'