stenci stenci - 2 months ago 6
Python Question

Regular expression matching all but a string

I need to find all the strings matching a pattern with the exception of two given strings.

For example, find all groups of letters with the exception of

aa
and
bb
. Starting from this string:

-a-bc-aa-def-bb-ghij-


Should return:

('a', 'bc', 'def', 'ghij')


I tried with this regular expression that captures 4 strings. I thought I was getting close, but (1) it doesn't work in Python and (2) I can't figure out how to exclude a few strings from the search. (Yes, I could remove them later, but my real regular expression does everything in one shot and I would like to include this last step in it.)

I said it doesn't work in Python because I tried this, expecting the exact same result, but instead I get only the first group:

>>> import re
>>> re.search('-(\w.*?)(?=-)', '-a-bc-def-ghij-').groups()
('a',)


I tried with negative look ahead, but I couldn't find a working solution for this case.

Answer

You can make use of negative look aheads.

For example,

>>> re.findall(r'-(?!aa|bb)([^-]+)', string)
['a', 'bc', 'def', 'ghij']

  • - Matches -

  • (?!aa|bb) Negative lookahead, checks if - is not followed by aa or bb

  • ([^-]+) Matches ony or more character other than -


Edit

The above regex will not match those which start with aa or bb, for example like -aabc-. To take care of that we can add - to the lookaheads like,

>>> re.findall(r'-(?!aa-|bb-)([^-]+)', string)