Jim Jim - 6 months ago 36
Perl Question

When is \G useful application in a regex?

I am not clear on the use/need of the

\G
operator.

I read in the perldoc:


You use the \G anchor to start the next match on the same string where
the last match left off.


I don't really understand this statement. When we use
\g
we usually move to the character after the last match anyway.

As the example shows:

$_ = "1122a44";
my @pairs = m/(\d\d)/g; # qw( 11 22 44 )


Then it says:


If you use the \G anchor, you force the match after 22 to start with
the a:


$_ = "1122a44";
my @pairs = m/\G(\d\d)/g;



The regular expression cannot match there since it does not
find a digit, so the next match fails and the match operator returns
the pairs it already found


I don't understand this either. "If you use the \G anchor, you force the match after 22 to start with a." But without the \G the matching will be attempted at
a
anyway right? So what is the meaning of this sentence?

I see that in the example the only pairs printed are 11 and 22. So 44 is not tried.

The example also shows that using
c
option makes it index 44 after the while.

To be honest, from all these I can not understand what is the usefulness of this operator and when it should be applied.

Could someone please help me understand this, perhaps with a meaningful example?

Update

I think I did not understand this key sentence:


If you use the \G anchor, you force the match after 22 to start with
the a . The regular expression cannot match there since it does not
find a digit, so the next match fails and the match operator returns
the pairs it already found.


This seems to mean that when the match fails, the regex does not proceed further attempts and is consistent with the examples in the answers

Also:


After the match fails at the letter a , perl resets pos() and the next
match on the same string starts at the beginning.

Answer

\G is an anchor; it indicates where the match is forced to start. When \G is present, it can't start matching at some arbitrary later point in the string; when \G is absent, it can.

It is most useful in parsing a string into discrete parts, where you don't want to skip past other stuff. For instance:

my $string = " a 1 # ";
while () {
    if ( $string =~ /\G\s+/gc ) {
        print "whitespace\n";
    }
    elsif ( $string =~ /\G[0-9]+/gc ) {
        print "integer\n";
    }
    elsif ( $string =~ /\G\w+/gc ) {
        print "word\n";
    }
    else {
        print "done\n";
        last;
    }
}

Output with \G's:

whitespace
word
whitespace
integer
whitespace
done

without:

whitespace
whitespace
whitespace
whitespace
done

Note that I am demonstrating using scalar-context /g matching, but \G applies equally to list context /g matching and in fact the above code is trivially modifiable to use that:

my $string = " a 1 # ";
my @matches = $string =~ /\G(?:(\s+)|([0-9]+)|(\w+))/g;
while ( my ($whitespace, $integer, $word) = splice @matches, 0, 3 ) {
    if ( defined $whitespace ) {
        print "whitespace\n";
    }
    elsif ( defined $integer ) {
        print "integer\n";
    }
    elsif ( defined $word ) {
        print "word\n";
    }
}
Comments