Sgt B Sgt B - 4 months ago 25x
Perl Question

Regex works, but receive warning: matches null string many times in regex errors

I've got a string that has a number of components I need to extract. These are well formed and predictable, but the order in which they appear varies. Below is a snippet that illustrates what the strings may look like and the regex I'm using to extract the information I need. This code works and I get the output expected.

my $str1 = '(test1=cat)(test2=dog)(test3=mouse)'; # prints cat\ndog\mouse
$str1 = '(test1=cat)(test3=mouse)(test2=dog)(test1=cat)'; # prints cat\ndog\nmouse
$str1 = '(test3=mouse)(test1=cat)'; # prints cat\nempty\nmouse
$str1 = '(test3=mouse)(test2=dog)'; # prints empty\ndog\nmouse
my $pattern1 = '(?=.*\(test1=(.*?)\))*(?=.*\(test2=(.*?)\))*(?=.*\(test3=(.*?)\))*';

if (my @map = $str1 =~ /$pattern1/) {
foreach my $match (@map) {
say $match if $match;
say "empty" if !$match;

The expected and received outcome for the last string above is as follows:


However, in addition to the expected response are the following warnings:

(?=.*\(test1=(.*?)\))* matches null string many times in regex; marked by <-- HERE in m/(?=.*\(test1=(.*?)\))* <-- HERE (?=.*\(test2=(.*?)\))*(?=.*\(test3=(.*?)\))*/ at /path/to/ line 32.
(?=.*\(test2=(.*?)\))* matches null string many times in regex; marked by <-- HERE in m/(?=.*\(test1=(.*?)\))*(?=.*\(test2=(.*?)\))* <-- HERE (?=.*\(test3=(.*?)\))*/ at /path/to/ line 32.
(?=.*\(test3=(.*?)\))* matches null string many times in regex; marked by <-- HERE in m/(?=.*\(test1=(.*?)\))*(?=.*\(test2=(.*?)\))*(?=.*\(test3=(.*?)\))* <-- HERE / at /path/to/ line 32.

This tells me that while my regex works, it may have some problems.

How could I adjust the above regex to continue to work as expected while eliminating the warnings?

Here are a few constraints I have to work with:

  • The order of the results must be maintained (e.g., "test1" will always be the first element of the array)

  • The field names aren't really "testN", there are a number of unique ones I have to work with, these are static values

  • Duplicates are fine, but the last one should be used (the above script does this)

I don't normally work with lookarounds so my mistake might be rudimentary (hopefully). Any advice or feedback is much appreciated. Thanks!

Edit - Running Perl 5.20


Matching a look-ahead (?=...) multiple times doesn't make sense. It doesn't consume any data from the object string, and so if it matches once it will match indefinitely

The main change that you need to make is to replace (?=.*\(test1=(.*?)\))* etc. with (?=.*\(test1=(.*?)\))?. That just makes your look-ahead "optional", and will get rid of your warnings

use strict;
use warnings 'all';

use Data::Dump;

my $pattern = qr/
    (?= .* \( test1= (.*?) \) )?
    (?= .* \( test2= (.*?) \) )?
    (?= .* \( test3= (.*?) \) )?

my @strings = qw/

for my $str ( @strings ) {

    next unless my @map = $str =~ /$pattern/;

    $_ //= 'empty' for @map;

    dd \@map;


["cat", "dog", "mouse"]
["cat", "dog", "mouse"]
["cat", "empty", "mouse"]
["empty", "dog", "mouse"]

However, this sounds like another case of getting a single regex pattern to do too much work. You are writing in Perl, so why not use it?

The following code assumes the same header as the full program above, up to and including the definition of @strings. The for loop is all that I have changed

for my $str ( @strings ) {
    my @map = map {  $str =~ / \( test$_= ( [^()]* ) \)/x ? $1 : 'empty' } 1 .. 3;
    dd \@map;


["cat", "dog", "mouse"]
["cat", "dog", "mouse"]
["cat", "empty", "mouse"]
["empty", "dog", "mouse"]

Or it may be that something different is appropriate. Hashes are useful for this sort of thing

for my $str ( @strings ) {
    my %map = $str =~ / \( ( test\d+ ) = ( [^()]* ) \) /gx; 
    dd \%map;


{ test1 => "cat", test2 => "dog", test3 => "mouse" }
{ test1 => "cat", test2 => "dog", test3 => "mouse" }
{ test1 => "cat", test3 => "mouse" }
{ test2 => "dog", test3 => "mouse" }