user2362824 user2362824 - 4 months ago 20
Python Question

Negative lookahead assertion in python

I am having below two line of log, where I want to have separate regular expression for finding each of them. There is no problem to trigger on the second log line. But I have problem to design expression for the first line. The name

Reset Reason test
is just an example of test, number of words in it may vary therefore I cannot define here any, more specific pattern, then just
.*
.

12.07.2016 13:54:20 SCR_OUTPUT: #### TC_0006 Reset Reason test
12.07.2016 13:54:20 SCR_OUTPUT: #### TC_0006 Reset Reason test done.


I am having regular expression generally doing the thing I want it to do:

([0-9:. ]*) .*SCR_OUTPUT: #### (TC_[a-fA-F0-9]{4,5}[:0-9]{0,4}) .*[ ](?!done\.$)


And I have two cases that I want to differentiate:
I based on example given here.
https://docs.python.org/3/howto/regex.html#lookahead-assertions

Everything work fine when it ends like this: (of course I have to modify my test strings)

[.](?!done$)


When I try it to be something that suits me more eg: ( my
done.
has dot at the end)

[.](?!done\.$)


Then it gets weird.
Another adaptation.
done.
should be followed with space and not with dot and the result gets crazy. Each line is giving positive findings.

[.](?!done\.$)


I have been testing that on pythex.org.
Under this link you can find latest version my experiment.

Anyone knows where I am having a bug?
Is it anyhow possible to trigger on such case?
Maybe I should do it in two steps?

Answer

If you want to exclude matching lines with done. at the end, you need to use a negative lookahead, and better anchored at the start of the line:

^(?!.* done\.$)([0-9.]+\s+[\d:]+)\s+SCR_OUTPUT:\s*####\s*(TC_\w+).*
 ^^^^^^^^^^^^^^

See the regex demo (remember to use the re.M flag to make ^ match the beginning of the line rather than the string start if you have multiline string input).

Note I enhanced the regex pattern for the strings you supplied (the initial part turned to ([0-9.]+\s+[\d:]+)\s+ greatly reduces backtracking, you should consider using something similar if this exact pattern does not match all your data).

Anyway, the core point of interest is the lookahead (?!.* done\.$) that immediately fails the match once it checks if there is a space + done. at the end ( done.) after 0+ characters other than a newline, as many as possible (.*).

Comments