Olian04 Olian04 - 8 months ago 51
Python Question

How to: Overlapping match

Lets say we have this:

A2 A1 B. #1

A1 B. #2

A3 A1 A8 B. #3

How would I go about if I want:

  1. To match:
    A2 A1 B.
    A1 B.

  2. To match:
    A1 B.

  3. To match:
    A3 A1 A8 B.
    A1 A8 B.
    A8 B.

So far I've got this regex:


But it won't match subsets of code that's already been matched (I'm matching using
)/ My guess is that
is doing just as its supposed to, and I'm just trying to force it into doing stupid stuff.


Answer Source

You can use lookahead for this and capture values inside the lookahead:

regex = r"(?=((?:A\d+\s+)+B\.))"

RegEx Demo

RegEx Description:

(?=               # start lookahead
   (              # start capturing group #1
      (?:         # start non-capturing group
         A\d+\s+  # match A followed by 1 or more digit followed by 1 or more whitespace
      )           # end non-capturing group
      +B\.        # match B and literal DOT
   )              # end capture group #1
)                 # end lookahead


>>> regex = r"(?=((?:A\d+\s+)+B\.))"

>>> print re.findall(regex, 'A2 A1 B.')
['A2 A1 B.', 'A1 B.']

>>> print re.findall(regex, 'A1 B.')
['A1 B.']

>>> print re.findall(regex, 'A3 A1 A8 B.')
['A3 A1 A8 B.', 'A1 A8 B.', 'A8 B.']