Mark M Mark M - 1 year ago 32
Perl Question

Search for substring and store another part of the string as variable in perl

I am revamping an old mail tool and adding MIME support. I have a lot of it working but I'm a perl dummy and the regex stuff is losing me.

I had:

foreach ( @{$body} ) {

next if /^$/;

if ( /NEMS/i ) {
/.*?(\d{5,7}).*/;
$nems = $1;
next;
}

if ( $delimit ) {
next if (/$delimit/ && ! $tp);
last if (/$delimit/ && $tp);

$tp = 1, next if /text.plain/;
$tp = 0, next if /text.html/;
s/<[^>]*>//g;
$newbody .= $_ if $tp;
} else {
s/<[^>]*>//g;
$newbody .= $_ ;
}
} # End Foreach


Now I have
$body_text
as the plain text mail body thanks to MIME::Parser. So now I just need this part to work:

foreach ( @{$body_text} ) {

next if /^$/;

if ( /NEMS/i ) {
/.*?(\d{5,7}).*/;
$nems = $1;
next;
}
} # End Foreach


The actual challenge is to find
NEMS=12345
or
NEMS=1234567
and set
$nems=12345
if found. I think I have a very basic syntax problem with the test because I'm not exposed to perl very often.

A coworker suggested:

foreach (split(/\n/,$body_text)){

next if /^$/;

if ( /NEMS/i ) {
/.*?(\d{5,7}).*/;

$nems = $1;
next;

}
}


Which seems to be working, but it may not be the preferred way?

edit:

So this is the most current version based on tips here and testing:

foreach (split(/\n/,$body_text)){

next if /^$/;

if ( /NEMS/i ) {
/^\s*NEMS\s*=\s*(\d+)/i;

$nems = $1;
next;

}
}

Answer Source

Match the last two digits as optional and capture the first five, and assign the capture directly

($nems) = /.*? (\d{5}) (?: \d{2} )?/x;  # /x allows spaces inside

The construct (?: ) only groups what's inside, without capture. The ? after it means to match that zero or one time. We need parens so that it applies to that subpattern only. So the last two digits are optional -- five digits or seven digits match. I also removed the unneeded .* at the end.

However, by what you say it appears that the whole thing can be simplified

if ( ($nems) = /^\s*NEMS \s* = \s* (\d{5}) (?:\d{2})?/ix ) { next }

where there is now no need for if (/NEMS/) and I've adjusted to the clarification that NEMS is at the beginning and that there may be spaces around =. Then you can also say

my $nems;

foreach ( split /\n/, $body_text ) {
    # ...
    next if ($nems) = /^\s*NEMS\s*=\s*(\d{5})(?:\d{2})?/i;
    # ...
}

what includes the clarification that the new $body_text is a multiline string.

It is clear that $nems is declared (needed) outside of the loop and I indicate that.

This allows yet more digits to follow; it will match on 8 digits as well (but capture only the first five). This is what your trailing .* in the regex implies.

Edit   It's been clarified that there can only be 5 or 7 digits. Then the regex can be tightened, to check whether input is as expected, but it should work as it stands, too.


A few notes, let me know if more would be helpful

  • The match operator returns a list so we need the parens in ($nems) = /.../;

  • The ($nems) = /.../ syntax is a nice shortcut, for ($nems) = $_ =~ /.../;.
    If you are matching on a variable other than $_ then you need the whole thing.

  • You always want to start Perl programs with

    use warnings 'all';
    use strict;
    

    This directly helps and generally results in better code.


The clarification of the evolved problem understanding states that all digits following = need be captured into $nems (and there may be 5,(not 6),7,8,9,10 digits). Then the regex is simply

($nems) = /^\s*NEMS\s*=\s*(\d+)/i;

where \d+ means a digit, one or more times. So a string of digits (match fails if there are none).

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download