igrepstuff igrepstuff - 5 months ago 11
Perl Question

perlre match capturing group multiple times on the same line

Given the following line;

<tag id='1'><![CDATA[this is a string of text]]><tag id='2'><![CDATA[this is another string of text]]><tag id='3'><![CDATA[this is the last string of text]]>


I'm trying to match the text string inside the CDATA square brackets in a way that returns every match without having to split the line first. I can accomplish this by splitting the line (see below) but am trying to better understand perl regex matching and whether I can accomplish the same using regex.

my $string = qq(<tag id='1'><![CDATA[this is a string of text]]><tag id='2'><![CDATA[this is another string of text]]><tag id='3'><![CDATA[this is the last string of text]]>)
my @splitline = split(/\</, $string);
foreach(@splitline){
if ($_ =~ /\!\[CDATA\[(.*?)\]\]/){
print "$1\n";
}
}


The result of above is


this is a string of text

this is another string of text

this is the last string of text


If I try this, it only returns the first match.

my $string = qq(<tag id='1'><![CDATA[this is a string of text]]><tag id='2'><![CDATA[this is another string of text]]><tag id='3'><![CDATA[this is the last string of text]]>)
if ($string =~ /\!\[CDATA\[(.*?)\]\]/){
print "$1\n";
}


changing my regex to the following returns no data

$string =~ /\!\[CDATA\[(.*?)+\]\]/g

Answer

The issue is you are calling the match in scalar context (I.E with the if statement). Instead you should call it in list context and load all the matches into an array. then you can check the array after and print the results.

my $string = qq(<tag id='1'><![CDATA[this is a string of text]]><tag id='2'><![CDATA[this is another string of text]]><tag id='3'><![CDATA[this is the last string of text]]>);

my @matches = $string =~ /\!\[CDATA\[(.*?)\]\]/g;

print join("\n",@matches) if @matches;

OUTPUT

this is a string of text
this is another string of text
this is the last string of text

if you really want to call it in scalar context then you will need to iterate ver all the matches as the perl documentation states that in scalar contect it will track the position of each match.

my $string = qq(<tag id='1'><![CDATA[this is a string of text]]><tag id='2'><![CDATA[this is another string of text]]><tag id='3'><![CDATA[this is the last string of text]]>);

while ($string =~ /\!\[CDATA\[(.*?)\]\]/g){
    print "$1\n";
}
Comments