Duncan Macleod Duncan Macleod - 6 months ago 17
Python Question

How can I ignore a string in python regex group matching?

Say I have the following string



>>> mystr = 'A-ABd54-Bf657'


(a random string of dash-delimited character groups) and want to match the opening part, and the rest of the string, in separate groups. I can use

>>> re.match('(?P<a>[a-zA-Z0-9]+)-(?P<b>[a-zA-Z0-9-]+)', mystr)


This produces a
groupdict()
like this:

{'a': 'A', 'b': 'ABd54-Bf657'}


How can I get the same regex to match group
b
but separately match a specific suffix (or set of suffices) if it exists (they exist)? Ideally something like this

>>> myregex = <help me here>
>>> re.match(myregex, 'A-ABd54-Bf657').groupdict()
{'a': 'A', 'b': 'ABd54-Bf657', 'test': None}
>>> re.match(myregex, 'A-ABd54-Bf657-blah').groupdict()
{'a': 'A', 'b': 'ABd54-Bf657-blah', 'test': None}
>>> re.match(myregex, 'A-ABd54-Bf657-test').groupdict()
{'a': 'A', 'b': 'ABd54-Bf657', 'test': 'test'}


Thanks.

Answer
mystr = 'A-ABd54-Bf657'
re.match('(?P<a>[a-zA-Z0-9]+)-(?P<b>[a-zA-Z0-9-]+?)(?:-(?P<test>test))?$', mystr)
                                                 ^                    ^

The first indicated ? makes the + quantifier non-greedy, so that it consumes the minimum possible.

The second indicated ? makes the group optional.

The $ is necessary or else the non-greediness plus optionality will match nothing.