Olian04 Olian04 - 17 days ago 8
Python Question

How to: Overlapping match

Lets say we have this:

A2 A1 B. #1

A1 B. #2

A3 A1 A8 B. #3


How would I go about if I want:


  1. To match:
    A2 A1 B.
    and
    A1 B.


  2. To match:
    A1 B.


  3. To match:
    A3 A1 A8 B.
    and
    A1 A8 B.
    and
    A8 B.



So far I've got this regex:

A\d\s(.*\.)


But it won't match subsets of code that's already been matched (I'm matching using
re.finditer
)/ My guess is that
re.finditer
is doing just as its supposed to, and I'm just trying to force it into doing stupid stuff.

Playground

Answer

You can use lookahead for this and capture values inside the lookahead:

regex = r"(?=((?:A\d+\s+)+B\.))"

RegEx Demo

RegEx Description:

(?=               # start lookahead
   (              # start capturing group #1
      (?:         # start non-capturing group
         A\d+\s+  # match A followed by 1 or more digit followed by 1 or more whitespace
      )           # end non-capturing group
      +B\.        # match B and literal DOT
   )              # end capture group #1
)                 # end lookahead

Code:

>>> regex = r"(?=((?:A\d+\s+)+B\.))"

>>> print re.findall(regex, 'A2 A1 B.')
['A2 A1 B.', 'A1 B.']

>>> print re.findall(regex, 'A1 B.')
['A1 B.']

>>> print re.findall(regex, 'A3 A1 A8 B.')
['A3 A1 A8 B.', 'A1 A8 B.', 'A8 B.']