BrianH BrianH - 6 months ago 9
Perl Question

Is there a better way to write Perl regexes with /x so the code is still easy to read?

I ran Perl::Critic on one of my scripts, and got this message:

Regular expression without "/x" flag at line 21, column 26. See page 236 of PBP.


I looked up the policy information here, and I understand that writing regular expressions in extended mode will help anyone who is looking at the code.

However, I am stuck as how to convert my code to use the /x flag.

CPAN Example:

# Match a single-quoted string efficiently...

m{'[^\\']*(?:\\.[^\\']*)*'}; #Huh?

# Same thing with extended format...

m{
' # an opening single quote
[^\\'] # any non-special chars (i.e. not backslash or single quote)
(?: # then all of...
\\ . # any explicitly backslashed char
[^\\']* # followed by an non-special chars
)* # ...repeated zero or more times
' # a closing single quote
}x;


This makes sense if you only look at the regex.

My Code:

if ($line =~ /^\s*package\s+(\S+);/ ) {


I am not exactly sure how to use an extended regex inside of an if statement. I can write it like this:

if (
$line =~ /
^\s* # starting with zero or more spaces
package
\s+ # at least one space
(\S+) # capture any non-space characters
; # ending in a semi-colon
/x
)
{


And this works, but I think this is almost harder to read than the original. Is there a better way (or a best practice way) to write this? I guess I could create a variable using qr//.

I'm not really looking for advice on re-writing this specific regex (although if I can improve it, I'll take advice) - I'm more looking for advice on how to expand a regex inside of an if statement.

I know Perl::Critic is just a guideline, but it would be nice to follow it.

Thanks in advance!

EDIT:
So after receiving a few answers, it became clear to me that making a regex multi-line with comments is not always necessary. People who understand basic regex should be able to understand what my example was doing - the comments I added were maybe a little unnecessary and verbose. I like the idea of using the extended regex flag, but still embedding spaces in the regex to make each part of the regex a little more clear.
Thanks for all the input!

Answer

Well, I really don't think you should waste vertical screen real estate on this. On the other hand, if I were to write this pattern over several lines, I would have used braces and indented the pattern:

if ($line =~ m{
        \A \s*
        package
        \s+
        (\S+)
        \s* ;
    }x 
) {

IMHO, the following version is perfectly fine:

if ( $line =~ m{ \A \s* package \s+ (\S+) \s* ; }x  ) {

in terms of getting the benefit of m//x.

The comments are completely unnecessary in this case because you are not doing anything tricky. I did add \s* before the semi-colon because sometimes people set the semi-colon apart from the package name and that should not throw off your match.