user2362824 user2362824 - 1 year ago 78
Python Question

Negative lookahead assertion in python

I am having below two line of log, where I want to have separate regular expression for finding each of them. There is no problem to trigger on the second log line. But I have problem to design expression for the first line. The name

Reset Reason test
is just an example of test, number of words in it may vary therefore I cannot define here any, more specific pattern, then just

12.07.2016 13:54:20 SCR_OUTPUT: #### TC_0006 Reset Reason test
12.07.2016 13:54:20 SCR_OUTPUT: #### TC_0006 Reset Reason test done.

I am having regular expression generally doing the thing I want it to do:

([0-9:. ]*) .*SCR_OUTPUT: #### (TC_[a-fA-F0-9]{4,5}[:0-9]{0,4}) .*[ ](?!done\.$)

And I have two cases that I want to differentiate:
I based on example given here.

Everything work fine when it ends like this: (of course I have to modify my test strings)


When I try it to be something that suits me more eg: ( my
has dot at the end)


Then it gets weird.
Another adaptation.
should be followed with space and not with dot and the result gets crazy. Each line is giving positive findings.


I have been testing that on
Under this link you can find latest version my experiment.

Anyone knows where I am having a bug?
Is it anyhow possible to trigger on such case?
Maybe I should do it in two steps?

Answer Source

If you want to exclude matching lines with done. at the end, you need to use a negative lookahead, and better anchored at the start of the line:

^(?!.* done\.$)([0-9.]+\s+[\d:]+)\s+SCR_OUTPUT:\s*####\s*(TC_\w+).*

See the regex demo (remember to use the re.M flag to make ^ match the beginning of the line rather than the string start if you have multiline string input).

Note I enhanced the regex pattern for the strings you supplied (the initial part turned to ([0-9.]+\s+[\d:]+)\s+ greatly reduces backtracking, you should consider using something similar if this exact pattern does not match all your data).

Anyway, the core point of interest is the lookahead (?!.* done\.$) that immediately fails the match once it checks if there is a space + done. at the end ( done.) after 0+ characters other than a newline, as many as possible (.*).