onlyf onlyf - 6 months ago 11
Perl Question

Grabbing one character after the Nth pattern match - PERL

i have a sample file that is structured like below, and i would like to perform some operations on it :

1112283569;AOEEEEAOAO.;300012299419;0030000302;ALLE;0.00;0.00;0.00;0.00;79149449.66;0.00;7914944
1112283569;AOEEEEAOAO.;300012;;;;AAAAA299419;*;;0.00;0.00;0.00;0.00;79149449.66;0.00;79149449.66
1112283569;AOEEEEAOAO.;*;*;;0.00;0.;;;;;;;;;00;0.00;0.00;79149449.66;0.00;79149449.66;0.00;79149
*;CON;*;0030000302;ΑLLEO;0.00;0.00;0.00;0.00;79149449.66;0.00;79149449.66;0.00;79149449.66;0.00
;CONE:;*;*;;0.00;0.00;0.00;0.00;79149449.66;0.00;79149449.66;0.00;;;79149449.66;0.00


I m trying to come up with a solution for this. I need to read a file that looks like the one above,
delimited by ';' and i need to run a check on the character after the 3rd delimiter on each line at a time. It will not be in a
static column, so i need some way to capture the character after the nth delimiter (;), i might be able do this with a regex (i think.)

Ie, for the above output :

Line 1 - Doesnt meet condition
Line 2 - Doesnt meet condition
Line 3 - Meets condition
Line 4 - Doesnt meet condition
Line 5 - Meets condition

Finally it would output something like :

1112283569;AOEEEEAOAO.;*;*;;0.00;0.;;;;;;;;;00;0.00;0.00;79149449.66;0.00;79149449.66;0.00;79149
;CONE:;*;*;;0.00;0.00;0.00;0.00;79149449.66;0.00;79149449.66;0.00;;;79149449.66;0.00


(Only the lines where the first character after the third delimiter is *)
I ve found this type of regex, but i m not sure it would apply in this situation? :

/\%(^\%([^ ]* \)\{6\}\)\@<=.

Answer

You can do it simply by split-ing on ;, then checking the first character of the required field.

use strict;
use warnings;

my $char = '*';
my $nth = 3;

my $file = 'data_delim.txt';
open my $fh, '<', $file or die "Cannot open $file -- $!";

while (my $line = <$fh>) 
{
    my @fields = split ';', $line, $nth+1;

    if ($fields[$nth] =~ m/^\Q$char/) {
        print $line;
    }   
}

The $nth above stands for "Nth" specification, 3 in the question example. We tell split to take only the needed N+1 elements, by passing the last argument. The \Q escapes the *, denying it its special meaning. See quotemeta. You can pick only the field needed for the check and shorten the loop body to

print $line if (split ';', $line)[3] =~ m/^\Q$char/; # or /\*/

I suspect that there may be more involved in the question and keep all preceding terms.

For the input file data_delim.txt with shown input this prints

1112283569;AOEEEEAOAO.;*;*;;0.00;0.;;;;;;;;;00;0.00;0.00;79149449.66;0.00;79149449.66;0.00;79149
;CONE:;*;*;;0.00;0.00;0.00;0.00;79149449.66;0.00;79149449.66;0.00;;;79149449.66;0.00

I used sample input and output as I didn't fully understand the description. I can only hope that this is a correct interpretation of the question.