Senthil Kumar Senthil Kumar - 27 days ago 10
Perl Question

Perl - Conditional splitting of a file into multiple based on floating numeric value

I want to split a file contents to multiple outputfiles by comparing floating number present as string in the last line. For example, the file below contains more than 100K such lines:

Start:/abc/def
.....
End 1.2
Start:/xyz/uvw
.....
End 2.8


I want to print every lines from
Start
to
End
to
OUTFILE1
if
End
contains values between 1 and 1.9. Otherwise print all such lines to
OUTFILE2
if
End
contains values between 2 and 2.9. Likewise, multiple output files have to be generated based on discrete range of the floating value upto 10 ie 0-1, 1-2,2-3 and so on. If there are many floating values in a given range, then entries should get appended to output files.

The code I tried below has issues in comparing floating number correctly and also problems in conditionally emptying the array contents to required output files. Any suggestions how to fix it?

foreach $lineIn(@file1_list) {
$_ = $lineIn;

if (/Start:/) {
$pattern1 = 1;
} elsif(/End\s/) {
my @slackno = split / \s + /, $_;
$pattern2 = 1;
push(@buflines, $_);
}
if ($pattern1 = ~1 and $pattern2 = ~0) {
push(@buflines, $_);
} else {
$pattern1 = 0;
$pattern2 = 0;
}
}
if ($slackno[3] >= 2.0 and $slackno[3] <= 2.9) {
foreach(@buflines) {
print FILE2 $_;
}
}
close(FILE2);

Answer Source

After you capture the number extract its integer part, then subtract it from the number to see whether it is <= 0.9. Then use that integer, or it + 1, for the file name.

use warnings;
use strict;

my $file = 'data.txt';

open my $fh, '<', $file or die "Can't open $file: $!";

my @buff;
while (<$fh>) 
{
    push @buff, $_;

    if (my ($num) = /^End\s+(.*)/) 
    {
        my $N = int $num;
        my $fout = 'name_'.  ($num - $N <= 0.9 ? $N : $N+1 ) . '.txt';

        open my $fh_out, '>>', $fout or die "Can't open $fout: $!";
        print $fh_out $_ for @buff;

        @buff = (); 
    }   
}

This expects that End num; is the end of a block. The file is opened for append (>>) since blocks within same ranges should be appended to the suitable file. Opening the file this way (in >> mode) creates a new file if it doesn't already exist so that takes care of both possibilities.

Tested with data including multiple blocks within same range and a block with N.95.