braymp braymp - 27 days ago 11
Python Question

Single regular expression in Python with named groups for interleaved text

I would like to create a single regular expression in Python that extracts two interleaved portions of text from a filename as named groups. An example filename is given below:

CM00626141_H12.d4_T0001F003L01A02Z03C02.tif


The part of the filename I'd like to extract is contained between the underscores, and consists of the following:


  • An uppercase letter:
    [A-H]

  • A zero-padded two-digit number:
    01
    to
    12

  • A period

  • A lowercase letter:
    [a-d]

  • A single digit:
    1
    to
    4



For the example above, I would like one group ('Row') to contain
H.d
, and the other group ('Column') to contain
12.4
. However, I don't know how to do this this when the text is separated as it is here.

EDIT: A constraint which I omitted: it needs to be a single regex to handle the string. I've updated the text/title to reflect this point.

Answer

Regexp capturing groups (whether numbered or named) do not actually capture text - they capture starting/ending indices within the original text. Thus, it is impossible for them to capture non-contiguous text. Probably the best thing to do here is have four separate groups, and combine them into your two desired values manually.