Buddy Buddy - 6 months ago 10
Perl Question

Parsing a file by summing up different columns of each row separated by blank line

I have a file input as below;

#

volume stats
start_time 1
length 2
--------
ID
0x00a,1,2,3,4
0x00b,11,12,13,14
0x00c,21,22,23,24

volume stats
start_time 2
length 2
--------
ID
0x00a,31,32,33,34
0x00b,41,42,43,44
0x00c,51,52,53,54

volume stats
start_time 3
length 2
--------
ID
0x00a,61,62,63,64
0x00b,71,72,73,74
0x00c,81,82,83,84


#

I need output in below format;

1 33 36 39 42
2 123 126 129 132
3 213 216 219 222


#

Below is my code;

#!/usr/bin/perl
use strict;
use warnings;
#use File::Find;

# Define file names and its location
my $input = $ARGV[0];

# Grab the vols stats for different intervals
open (INFILE,"$input") or die "Could not open sample.txt: $!";
my $date_time;
my $length;
my $col_1;
my $col_2;
my $col_3;
my $col_4;
foreach my $line (<INFILE>)
{

if ($line =~ m/start/)
{
my @date_fields = split(/ /,$line);
$date_time = $date_fields[1];
}
if ($line =~ m/length/i)
{
my @length_fields = split(/ /,$line);
$length = $length_fields[1];
}
if ($line =~ m/0[xX][0-9a-fA-F]+/)
{
my @volume_fields = split(/,/,$line);
$col_1 += $volume_fields[1];
$col_2 += $volume_fields[2];
$col_3 += $volume_fields[3];
$col_4 += $volume_fields[4];
#print "$col_1\n";
}
if ($line =~ /^$/)
{
print "$date_time $col_1 $col_2 $col_3 $col_4\n";
$col_1=0;$col_2=0;$col_3=0;$col_4=0;
}
}
close (INFILE);


#

my code result is;

1
33 36 39 42
2
123 126 129 132


#

BAsically, for each time interval, it just sums up the columns for all the lines and displays all the columns against each time interval.

Answer

Basically, a block begins with start_time and ends with a line of of whitespace. If instead end of block is always assured to be an empty line, you can change the test below.

It helps to use arrays instead of variables with integer suffixes.

When you hit the start of a new block, record the start_time value, and clear the column sums. When you hit a stat line, update column sums, and when you hit a line of whitespace, print the column sums.

This way, you keep your program's memory footprint proportional to the longest line of input as apposed to the largest block of input. In this case, there isn't a huge difference, but, in real life, there can be. Your original program was reading the entire file into memory as a list of lines which would really cause your program's memory footprint to balloon when used with large input sizes.

#!/usr/bin/env perl

use strict;
use warnings;

my $start_time;
my @cols;

while (my $line = <DATA>) {
    if ( $line =~ /^start_time \s+ ([0-9]+)/x) {
        $start_time = $1;
    }
    elsif ( $line =~ /^0x/ ) {
        my ($id, @vals) = split /,/, $line;
        for my $i (0 .. $#vals) {
            $cols[ $i ] += $vals[ $i ];
        }
    }
    elsif ( !($line =~ /\S/) ) {
        if ( @cols ) {
            print join("\t", $start_time, @cols), "\n";
            @cols = ();
        }
    }
}

if ( @cols ) {
    print join("\t", $start_time, @cols), "\n";
}

__DATA__
volume stats
start_time  1
length      2
--------
ID
0x00a,1,2,3,4
0x00b,11,12,13,14
0x00c,21,22,23,24

volume stats
start_time  2
length      2
--------
ID
0x00a,31,32,33,34
0x00b,41,42,43,44
0x00c,51,52,53,54

volume stats
start_time  3
length      2
--------
ID
0x00a,61,62,63,64
0x00b,71,72,73,74
0x00c,81,82,83,84

Output:

1  33  36  39  42
2   123 126 129 132
3   213 216 219 222
Comments