mkHun mkHun - 1 year ago 51
Perl Question

Why negative look ahead not consider in backtracking?

Suppose I have a string like this:

121456 word123word456word897 10:10

My condition is to check the
at the end of the string. So I will write the pattern as follows:

$s = "121456 word123word456word897 10:10";
if($s =~m/\d+(.+)(?=10:10)/)
print "$1\n"; #
print "Hello ";

match up to end, then it will backtrack for to match
. And

But the problem is, I write the condition with negative look ahead, but the regex search engine can't backtrack.

if($s =~m/\d+(.+)(?!10:10)/)
print "$1\n";
print "Hello ";

word123word456word897 10:10
. And negative look ahead is not consider in backtracking. What is the problem of this negative look ahead?

Answer Source

NOTE :- I will explain this of my own understanding. All corrections are welcomed

Why is this happening here?

By default, the aim of a regex engine is to meet all the required conditions to find a match in a string. This is achieved via backtracking, simple matching and jumping to different saved states (usually supported by NFA engines) if current state fails to satisfy the regex condition.

Once all the conditions are met, the requirement is fulfilled and the engine stops checking for any other thing. There is further no need of backtracking, matching or doing anything fancy because the requirements are already met.

Now coming back to your question, following is your string

121456    word123word456word897   10:10

In your first regex


i) \d+ matches all the digits <-- No Problem

ii) As .+ is greedy, it will match all the string and move to last <-- No problem

iii) To satisfy the next condition (?=10.10), there is no string left. So all the conditions are not fulfilled and hence to meet this condition, regex engine starts backtracking till it finds 10:10

In your second regex


i, ii) The first two steps are exact same as above

iii) To satisfy the next condition (?!10:10), whatever follows (here, we already have reached end of string or $ due to greediness of .+) should not match 10:10. It is obvious that end of string do not matches 10:10. Hence all of our condition is fulfilled. So there is no need of backtracking or doing anything at all because all our required conditions are met.

A picture is worth thousand words

For \d+.+(?=10:10)

enter image description here

For \d+.+(?!10:10)

enter image description here

Image credit :-