AnnaSchumann AnnaSchumann - 8 months ago 51
Perl Question

Skipping lines based on multiple listed criteria

I'm iterating over a file

<$IN>
, processing it line by line. There are some lines I wish to skip based on the values held within their split fields, which I currently accomplish via:

while (<$IN>) {
chomp $_;
my(@F) = split("\t", $_);
next if ($F[0] == 1 && $F[1] > 20 && $F[1] < 40);


However I have multiple such lines and criteria to check against. I'd much rather load in a .txt file containing the test values i.e.:

1 20 40
1 63 68
2 1 10
2 12 18
3 3 9


And find a shorter way of testing any given line against these ranges without the need for many lines as I currently have:

next if ($F[0] == 1 && $F[1] > 20 && $F[1] < 40);
next if ($F[0] == 1 && $F[1] > 63 && $F[1] < 68);
next if ($F[0] == 2 && $F[1] > 1 && $F[1] < 10);
next if ($F[0] == 2 && $F[1] > 12 && $F[1] < 18);
next if ($F[0] == 3 && $F[1] > 3 && $F[1] < 9);


Could anybody provide an example of how to do this in a more succinct manner?

Answer Source

You can outsource that configuration into a file without problems. I would use a hash(ref) for the column numbers and an array ref for the min and max values like this. Since there can be more than one pair of values per field, you need an array of arrays.

use strict;
use warnings;
use Data::Printer;

# open my $fh, '<', 'config.txt' or die $!;
my $config;
while (my $line = <DATA>) {
    chomp $line;
    my ( $col, $min, $max )= split /\s+/, $line;
    push @{ $config->{$col} }, [ $min, $max ];
}

p $config;
__DATA__
1   20  40
1   63  68
2   1   10
2   12  18
3   3   9

The data structure will look like this:

\ {
    1   [
        [0] [
            [0] 20,
            [1] 40
        ],
        [1] [
            [0] 63,
            [1] 68
        ]
    ],
    2   [
        [0] [
            [0] 1,
            [1] 10
        ],
        [1] [
            [0] 12,
            [1] 18
        ]
    ],
    3   [
        [0] [
            [0] 3,
            [1] 9
        ]
    ]
}

The hash makes it easy to use exists to check if the field hash values at all. You can then use it like this.

use strict;
use warnings;

# see the code example above
my $config = {
    1 => [ [ 20, 40 ], [ 63, 68 ] ],
    2 => [ [ 1,  10 ], [ 12, 18 ] ],
    3 => [ [ 3,  9 ] ],
};

sub skip_line {
    my @F = @_;

    return unless exists $config->{$F[0]}; # nothing to skip in this field

    foreach my $pair (@{ $config->{$F[0]} }) {
        return 1 if $F[1] > $pair->[0] && $F[1] < $pair->[1];
    }
}

while (<DATA>) {
    chomp;
    my @F = split /\s+/; # \t

    next if skip_line(@F);

    print; # output unskipped lines
}

__DATA__
1 21 30
2 19 20

You can of course use a shorter form instead of the skip_line function, as shown in choroba's answer.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download