LearningCpp LearningCpp - 2 months ago 8
Perl Question

Handling Nested Delimiters in perl

use strict;
use warnings;

my %result_hash = ();
my %final_hash = ();
Compare_results();

foreach my $key (sort keys %result_hash ){
print "$key \n";
print "$result_hash{$key} \n";
}

sub Compare_results
{

while ( <DATA> )
{
my($instance,$values) = split /\:/, $_;
$result_hash{$instance} = $values;

}
}
__DATA__
1:7802315095\d\d,7802315098\d\d;7802025001\d\d,7802025002\d\d,7802025003\d\ d,7802025004\d\d,7802025005\d\d,7802025006\d\d,7802025007\d\d
2:7802315095\d\d,7802025002\d\d,7802025003\d\d,7802025004\d\d,7802025005\d\d,7802025006\d\d,7802025007\d\d


Output

1
7802315095\d\d,7802315098\d\d;7802025001\d\d,7802025002\d\d,7802025003\d\d,7802025004\d\d,7802025005\d\d,7802025006\d\d,7802025007\d\d

2
7802315095\d\d,7802025002\d\d,7802025003\d\d,7802025004\d\d,7802025005\d\d,7802025006\d\d,7802025007\d\d


Iam trying to fetch value of each key and again trying to split the comma seperated value from result hash , if i find a semicolon in any value i would want to store the left and right values in separate hash keys.

Something like below

1.#split the value of result_hash{$key} again by , and see whether any chunk is seperated by ;
2. #every chunk without ; and value on left with ; should be stored in
@{$final_hash{"eto"}} = ['7802315095\d\d','7802315098\d\d','7802025002\d\d','7802025003\d\d','7802025004\d\d','7802025005\d\d','7802025006\d\d','7802025007\d\d'] ;
3.#Anything found on the right side of ; has to be stored in
@{$final_hash{"pro"}} = ['7802025001\d\d'] ;


Is there a way that i can handle everything in the subroutine? Can i make the code more simpler

Update :

I tried splitting the string in a single shot, but its just picking the values with semicolon and ignoring everything

foreach my $key (sort keys %result_hash ){
# print "$key \n";
# print "$result_hash{$key} \n";
my ($o,$t) = split(/,|;/, $result_hash{$key});
print "Left : $o \n";
print "Left : $t \n";
#push @{$final_hash{"eto"}}, $o;
#push @{$final_hash{"pro"}} ,$t;
}


}

My updated code after help

sub Compare_results
{
open my $fh, '<', 'Data_File.txt' or die $!;
# split by colon and further split by , and ; if any (done in insert_array)
my %result_hash = map { chomp; split ':', $_ } <$fh> ;
foreach ( sort { $a <=> $b } (keys %result_hash) )
{
($_ < 21) ? insert_array($result_hash{$_},"west"):insert_array($result_hash{$_},"east");
}
}


sub insert_array()
{
my ($val,$key) = @_;
foreach my $field (split ',', $val)
{
$field =~ s/^\s+|\s+$//g; # / turn off editor coloring
if ($field !~ /;/) {
push @{ $file_data{"pto"}{$key} }, $field ;
}
else {
my ($left, $right) = split ';', $field;
push @{$file_data{"pto"}{$key}}, $left if($left ne '') ;
push @{$file_data{"ero"}{$key}}, $right if($right ne '') ;
}
}
}


Thanks

Answer

Update   Added a two-pass regex, at the end


Just proceed systematically, analyze the string step by step. The fact that you need consecutive splits and a particular separation rule makes it unwieldy to do in one shot. Better have a clear method than a monster statement.

use warnings 'all';
use strict;   
use feature 'say';

my (%result_hash, %final_hash); 

Compare_results(); 

say "$_ => $result_hash{$_}" for sort keys %result_hash;
say '---';
say "$_ => [ @{$final_hash{$_}} ]" for sort keys %final_hash;

sub Compare_results 
{   
    %result_hash = map { chomp; split ':', $_ } <DATA>;

    my (@eto, @pro);
    foreach my $val (values %result_hash)
    {   
        foreach my $field (split ',', $val)
        {   
            if ($field !~ /;/) { push @eto, $field }
            else { 
                my ($left, $right) = split ';', $field;
                push @eto, $left;
                push @pro, $right;
            }
        }    
    }        
    $final_hash{eto} = \@eto;
    $final_hash{pro} = \@pro;
    return 1;                  # but add checks above
}

There are some inefficiencies here, and no error checking, but the method is straightforward. If your input is anything but smallish please change the above to process line by line, what you clearly know how to do. It prints

1 => ...  (what you have in the question)
---
eto => [ 7802315095\d\d 7802315098\d\d 7802025002\d\d 7802025003\d\ d ...
pro => [ 7802025001\d\d ]

Note that your data does have one loose \d\ d.


We don't need to build the whole hash %result_hash for this but only need to pick the part of the line after :. I left the hash in since it is declared global so you may want to have it around. If it in fact isn't needed on its own this simplifies considerably

sub Compare_results {
    my (@eto, @pro);
    while (<DATA>) {
        my ($val) = /:(.*)/;
        foreach my $field (split ',', $val)
        # ... same
    }
    # assign to %final_hash, return from sub
}

Thanks to ikegami for comments.


Just for the curiosity's sake, here it is in two passes with regex

sub compare_rx {
    my @data = map { (split ':', $_)[1] } <DATA>;
    $final_hash{eto} = [ map { /([^,;]+)/g  } @data ];
    $final_hash{pro} = [ map { /;([^,;]+)/g } @data ];
    return 1;
}

This picks all characters which are not , or ;, using the negated character class, [^,;]. So that is up to the first either of them, left to right. It does this globally, /g, so it keeps going through the string, collecting all fields that are "left of" , or ;. Then it cheats a bit, picking all [^,;] that are right of ;. The map is used to do this for all lines of data.

If %result_hash is needed build it instead of @data and then pull the values from it with my @values = values %hash_result and feed the map with @values.

Or, broken line by line (again, you can build %result_hash instead of taking $data directly)

my (@eto, @pro);
while (<DATA>) {
    my ($data) = /:(.*)/;
    push @eto, $data =~ /([^,;]+)/g; 
    push @pro, $data =~ /;([^,;]+)/g;
}