John Doe John Doe - 4 months ago 13
Perl Question

Perl greedy regex is not acting greedy

Giving the following code:


use strict;
use warnings;

my $text = "asdf(blablabla)";

$text =~ s/(.*?)\((.*)\)/$2/;
print "\nfirst match: $1";
print "\nsecond match: $2";


I expected that
$2
would catch my last bracket, yet my output is:

enter image description here

If
.*
by default it's greedy why it stopped at the bracket?

Answer

The .* is a greedy subpattern, but it does not account for grouping. Grouping is defined with a pair of unescaped parentheses (see Use Parentheses for Grouping and Capturing).

See where your group boundaries are:

s/(.*?)\((.*)\)/$2/
  | G1|  |G2| 

So, the \( and \) matching ( and ) are outside the groups, and will not be part of neither $1 nor $2.

If you need the ) be part of $2, use

s/(.*?)\((.*\))/$2/
              ^

A regex engine is processing both the string and the pattern from left to right. The first (.*?) is handled first, and it matches up to the first literal ( symbol as it is lazy (matches as few chars as possible before it can return a valid match), and the whole part before the ( is placed into Group 1 stack. Then, the ( is matched, but not captured, then (.*) matches any 0+ characters other than a newline up to the last ) symbol, and places the capture into Group 2. Then, the ) is just matched. The point is that .* grabs the whole string up to the end, but then backtracking happens since the engine tries to accommodate for the final ) in the pattern. The ) must be matched, but not captured in your pattern, thus, it is not part of Group 2 due to the group boundary placement. You can see the regex debugger at this regex demo page to see how the pattern matches your string.

Comments