Josh M. Josh M. - 29 days ago 20
Python Question

Regular expression to match with optional following text

I'm very new to regular expressions and I need some help finding the correct regular expression.

I have a text file of the form:

apple 4
bananas 5
bananas 5 7
apple 3
apple 6
bananas 3
bananas 4 5
apple 3
bananas 9


I am looking for a regular expression that will match the last occurrence of
"bananas.*"
after each
"apple.*"
, keeping in mind that for every
"apple.*"
there may be no
"bananas.*"
. The regex should match to the following:

bananas 5 7
bananas 4 5
bananas 9


Thanks in advance. I am doing this in python if that helps.

Jan Jan
Answer

It actually is possible with regular expressions:

^apple.+[\n\r]
(?:(bananas.*)[\n\r]?)+

See a demo on regex101.com, mind the different modifiers and use group 1 of every match.


As full Python code:

import re

string = """
apple 4
bananas 5
bananas 5 7
apple 3
apple 6
bananas 3
bananas 4 5
apple 3
bananas 9 
"""

rx = re.compile(r"""
        ^apple.+[\n\r]
        (?:(bananas.*)[\n\r]?)+
        """, re.MULTILINE | re.VERBOSE)

bananas = [m.group(1) for m in rx.finditer(string)]
print(bananas)

See a demo on ideone.com.