saint1729 saint1729 - 3 months ago 17
Perl Question

regular expression that matches any word that starts with pre and ends in al

The following regular expression gives me proper results when tried in Notepad++ editor but when tried with the below perl program I get wrong results. Right answer and explanation please.

The link to file I used for testing my pattern is as follows:

(http://sainikhil.me/stackoverflow/dictionaryWords.txt)

Regular expression: ^Pre(.*)al(\s*)$

Perl program:

use strict;
use warnings;

sub print_matches {
my $pattern = "^Pre(.*)al(\s*)\$";
my $file = shift;

open my $fp, $file;

while(my $line = <$fp>) {
if($line =~ m/$pattern/) {
print $line;
}
}
}

print_matches @ARGV;

Answer

You're getting messed up by assigning the pattern to a variable before using it as a regex and putting it in a double-quoted string when you do so.

This is why you need to escape the $, because, in a double-quoted string, a bare $ indicates that you want to interpolate the value of a variable. (e.g., my $str = "foo$bar";)

The reason this is causing you a problem is because the backslash in \s is treated as escaping the s - which gives you just plain s:

$ perl -E 'say "^Pre(.*)al(\s*)\$";'
^Pre(.*)al(s*)$

As a result, when you go to execute the regex, it's looking for zero or more ses rather than zero or more whitespace characters.

The most direct fix for this would be to escape the backslash:

$ perl -E 'say "^Pre(.*)al(\\s*)\$";'
^Pre(.*)al(\s*)$

A better fix would be to use single quotes instead of double quotes and don't escape the $:

$ perl -E "say '^Pre(.*)al(\s*)$';"
^Pre(.*)al(\s*)$

The best fix would be to use the qr (quote regex) operator instead of single or double quotes, although that makes it a little less human-readable if you print it out later to verify the content of the regex (which I assume to be why you're putting it into a variable in the first place):

$ perl -E "say qr/^Pre(.*)al(\s*)$/;"
(?^u:^Pre(.*)al(\s*)$)

Or, of course, just don't put it into a variable at all and do your matching with

if($line =~ m/^Pre(.*)al(\s*)$/) ...
Comments