Bravado Bravado - 1 month ago 19
Perl Question

Perl, search string for occurrence of items of array

For a file filter, I want to use an array of words, where lines are checked if they match any of the words.

I already have a rather straightforward approach to this (only the essential matching part):

# check if any of the @words is found in $term

@words= qw/one
$term= "too for the show";

# the following looks very C like

$size= @words;
$found= 0;

for ($i= 0; $i<$size && !$found; $i++) {
$found|= $term=~ /$words[$i]/;

printf "found= %d\n", $found;

Having seen a lot of arcane syntax and solutions in Perl, I'm wondering if (or rather what) are more compact ways of writing this.


Use Regexp::Assemble to turn the search into one regex. That way each string only has to be scanned once making it more efficient for large numbers of lines.

Regexp::Assemble is preferable to doing it manually. It has a full API of things you might want to do with such a regex, it can handle edge cases, and it can intelligently compile into a more efficient regex.

For example, this program produces (?^:\b(?:t(?:hree|wo)|one)\b) which will result in less backtracking. This becomes VERY important as your word list increases in size. Recent versions of Perl, about 5.14 and up, will do this for you.

use strict;
use warnings;
use v5.10;

use Regexp::Assemble;

# Wrap each word in \b (word break) so only the full word is
# matched. 'one' will match 'money' but '\bone\b' won't.
my @words= qw(

# These lines simulate reading from a file.
my @lines = (
    "won for the money\n",
    "two for the show\n",
    "three to get ready\n",
    "now go cat go!\n"

# Assemble all the words into one regex.
my $ra = Regexp::Assemble->new;

for my $line (@lines) {
    print $line if $line =~ $ra;

Also note the foreach style loop to iterate over an array, and the use of a statement modifier.

Finally, I used \b to ensure that only the actual words are matched, not substrings like money.