^(\w+,?\s?)+(?=:): hi hey\?$
Aaaaaaaaa, bbbbbbbb, cccccccc, dddddddddd, eeeeeeeee: hi?
reg = re.compile('^(\w+,?\s?)+(?=:): hi hey\?$')
print reg.search('Aaaaaaaaa, bbbbbbbb, cccccccc, dddddddddd, eeeeeeeee: hi?')
aa, bb: qqq
aa, : qqq?
aa, bb, cc:?
, bb, cc: qqq?
aa, bb: qq?
aa, b, c,d,e,f, g, h: qq?
aa, bb, cc: qq ee ff gggg hhhh?
It hangs because of this
I.e. too complex on failure.
Works fine if it matches, blows up on failure.
(\w,?\s?)+ is identical to
(\w+,?\s?)+ but won't hang.
So, change it to this
^(\w,?\s?)+(?=:): hi hey\?$ and problem solved.
As a bonus, this
^(\w,?\s?)+: hi hey\?$ is identical.
Also, you can substitute
.*?\?$ in place of your literal
if expected to be variable literal.
Error: Target Operation .. The complexity of matching the regular expression exceeded predefined bounds. Try refactoring the regular expression to make each choice made by the state machine unambiguous. This exception is thrown to prevent "eternal" matches that take an indefinite period time to locate.
Note that there is always a potential problem with nested quantifiers.
I.e. those that are greedy and open ended, like (b
This can almost be cured by removing an inner nest (like
b+ in the example).
By making it un-quantified, we can call that a pseudo anchor.
That is, it should be first in the group and is a un-quantified, required character.
This forces the engine on backtrack to go to that character again to check it.
If it is not quantified, it gives up immediately and will not even look at
the rest of the expression.
Thus it goes past that position in the string to find the next literal
That is basically what this backtracking cure is all about.
Given the backtrack pitfalls, we can make a solution to get the desired match.
^ # BOS \s* # Wsp trim ( # (1 start), Values - minimum of 2 required \w+ \s* # First word (?: [,\s] \s* \w+ )+ # One or more space or comma seperated # word value's ) # (1 end) \s* # Wsp trim : # Colon \s* # Wsp trim ( # (2 start), Question - [^:]*? # Not a colon \w # At least a word char [^:]*? # Not a colon ) # (2 end) \s* # Wsp trim \? # '?' \s* # Wsp trim $ # EOS