voltas voltas - 1 month ago 10
Perl Question

Group input values using Perl

I have decided to learn Perl and try to implement it for my assignment. Below is the flat file that has worker details.

------------------------------------------------------------
Worker_id: 8CA980
Name: User1
Checkin_Time: Mon, 6 Jun 2016 09:09:28
Address: Floor: 1
Street: lane 2
City: Some city
State: Some state
Access:
/org/company/building_1
/org/company/building_2
/org/company/building_3
------------------------------------------------------------
Worker_id: 128AD6
Name: User2
Checkin_Time: Mon, 6 Jun 2016 10:09:28
Address: Floor: 2
Street: lane 3
City: Some city
State: Some state
Access:
/org/company/building_1
/org/company/building_2
------------------------------------------------------------
Worker_id: 699A0
Name: User1
Checkin_Time: Mon, 6 Jun 2016 08:15:00
Address: Floor: 1
Street: lane 3
City: Some city
State: Some state
Access:
/org/company/building_1
------------------------------------------------------------


What I'm trying to accomplish is to parse and store the file values like,

@worker = <all the worker ids>
@name = <all the worker names>
@time = <all the worker check-in time>
@address = <worker address>
@access = <worker's access>


My code snippet:

#!/usr/bin/perl

use 5.010;
open (my $FH, '<', 'C:\\temp\\details.txt') or die "Can't read file: $!\n";
$/ = "*****"; # to change the default input separator from new line to some other
while (<$FH>) {
@temp=split(/-{6,}/, $_);
}
close ($FH);
shift(@temp); #used shift as there was a empty array field

for ( $k=0; $k<@temp; $k++) {
@temp1 = split(/\n/, $temp[$k]);
@temp2 = @temp1;
print "1st VALUE ===> $temp2[0]\n";
print "2nd VALUE ===> $temp2[1]\n";
print "3rd VALUE ===> $temp2[3]\n";
......
.....
}


Output is

1st VALUE ===>
2nd VALUE ===> Worker_id: 8CA980
3rd VALUE ===> Checkin_Time: Mon, 6 Jun 2016 09:09:28
4th VALUE ===> Address: Floor: 1
5th VALUE ===> Street: lane 2
6th VALUE ===> City: Some city
7th VALUE ===> State: Some state
8th VALUE ===> Access:

1st VALUE ===>
2nd VALUE ===> Worker_id: 128AD6
3rd VALUE ===> Checkin_Time: Mon, 6 Jun 2016 10:09:28
4th VALUE ===> Address: Floor: 2
5th VALUE ===> Street: lane 3
6th VALUE ===> City: Some city
7th VALUE ===> State: Some state
8th VALUE ===> Access:

.........


Since I'm splitting values based on the new line, Address and Access details are not getting stored as a single array element or value.I'm not quite getting the effective logic to organize the details to my above mentioned array format.

Yes , the dirty way would be splitting the values based on every unique id like - Worker_id then Name,Time so on but that would be absurd.

Could you please help me here with any suggestions? Thanks.

Answer

First off - I wouldn't use separate arrays. A single data structure - an array of hashes - looks more sensible for your use case.

Secondly - $/ is your friend. It's the record separator, and lets you iterate 'record by record' - and you have a clear one of ------

And that looks something like this:

#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;

local $/ = "\n--";

my @records;

while (<DATA>) {
   chomp;
   my ($worker) = m/Worker_id: (.*)/g;
   next unless $worker;
   my ($name)    = m/Name: (.*)/g;
   my ($checkin) = m/Checkin_Time: (.*)/g;

   #slightly more complicated patterns for multi-line fields
   #searches for lines, terminated by a word: at start of line,
   #end of record or ---- on a line. 
   my ($address) = m/Address: (.*)(?:\n\w+:|\n---|\Z)/gms;
   my ($access)  = m/Access:\s*\n(.*)(?:\n\w+:|\n--|\Z)/gms;
   $address =~ s/\s*\n\s*/, /g;
   push(
      @records,
      {  worker  => $worker,
         name    => $name,
         checkin => $checkin,
         access  => $access,
         address => $address
      }
   );

}

print Dumper \@records;

__DATA__
------------------------------------------------------------
Worker_id: 8CA980
Name: User1
Checkin_Time: Mon, 6 Jun 2016 09:09:28 
Address: Floor: 1
         Street: lane 2 
         City: Some city
         State: Some state
Access: 
/org/company/building_1
/org/company/building_2
/org/company/building_3
------------------------------------------------------------
Worker_id: 128AD6
Name: User2
Checkin_Time: Mon, 6 Jun 2016 10:09:28 
Address: Floor: 2
         Street: lane 3 
         City: Some city
         State: Some state
Access: 
/org/company/building_1
/org/company/building_2
------------------------------------------------------------
Worker_id: 699A0
Name: User1
Checkin_Time: Mon, 6 Jun 2016 08:15:00 
Address: Floor: 1
         Street: lane 3 
         City: Some city
         State: Some state
Access: 
/org/company/building_1
------------------------------------------------------------

It might also make sense to array-ify your 'access' field:

     access  => [split /\n/, $access],

But this gives you as output:

$VAR1 = [
          {
            'address' => 'Floor: 1, Street: lane 2, City: Some city, State: Some state, ',
            'access' => [
                          '/org/company/building_1',
                          '/org/company/building_2',
                          '/org/company/building_3'
                        ],
            'checkin' => 'Mon, 6 Jun 2016 09:09:28 ',
            'worker' => '8CA980',
            'name' => 'User1'
          },
          {
            'worker' => '128AD6',
            'address' => 'Floor: 2, Street: lane 3, City: Some city, State: Some state, ',
            'access' => [
                          '/org/company/building_1',
                          '/org/company/building_2'
                        ],
            'checkin' => 'Mon, 6 Jun 2016 10:09:28 ',
            'name' => 'User2'
          },
          {
            'name' => 'User1',
            'address' => 'Floor: 1, Street: lane 3, City: Some city, State: Some state, ',
            'access' => [
                          '/org/company/building_1'
                        ],
            'checkin' => 'Mon, 6 Jun 2016 08:15:00 ',
            'worker' => '699A0'
          }
        ];
Comments