mropa mropa -4 years ago 83
Perl Question

Transforming irregular data into a structured table with Perl

I would like to transform the following lines of data into a structured table with perl:


Each value before the equal sign represents the identifier of the
corresponding value. Now I like to produce a table for further
analysis. The first row should include the identifiers and below their values. There is an unequal amount of identifiers in the
three rows which have to be kept in mind. Missing values should be
filled with the string "NA". So it should look like this



My approach was to first get all the variables. That would be the header/first row for the table. E.g.

my @data_dirty = <STDIN>;

## get the columns names
my ( @tmp, @var );
foreach my $j (@data_dirty) {
foreach my $i (split /\|/, $j) {
$i =~ s/\[.*\]//g;
$i =~ s/\s+//g;
$i =~ s/(.*)=.*/$1/g;
push(@tmp, $i);
@var = uniq @tmp;

Afterwards I might check each row if the variable in
exists, if yes write the corresponding value otherwise write an "NA". However, I had some troubles with the checks and the correct storing of the data so that afterwards the output looks as desired.

Answer Source

The key to addressing such needs is to divide and conquer. A careful choice of subroutines is all you need.

Here, data needs to be loaded and all identifiers need to be known a priori before any table is printed.

The example below doesn't order the columns by their sequence of appearance (left as an exercise for the reader):

use strict;
use warnings;

my ( $data, $headers ) = load_data( 'tabular_data.txt' );

print_tabular( $data, $headers );

sub load_data {

    my ( $file ) = @_;

    open my $fh, '<', $file or die $!;

    my ( @records, %fields );
    while ( my $line = <$fh> ) {

        chomp $line;

        my @columns = split /\|/, $line;               # Get columns
        my %entries = map { split /=/, $_ } @columns;  # Populate record,
                                                       # keys = fields
        push @records, \%entries;                      # Add to data collection

        $fields{$_}++ for keys %entries;               # Detect new headers

    close $fh;

    return ( \@records, [ keys %fields ] );

sub print_tabular {

    my ( $data, $headers ) = @_;

    pretty_print( @$headers );

    for my $record ( @$data ) {
        my @values = map { exists $record->{$_}  # exists check needed...
                             ? $record->{$_}     # ... otherwise header '0'
                             : 'NA'              # ... would always print 'NA'
                         } @$headers;
        pretty_print( @values );

sub pretty_print { print join( '|', @_ ), "\n" }
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download