stayingsong stayingsong - 1 year ago 43
Perl Question

How do multiple arrow operators in series work in Perl?

I ran across a piece of Perl code I wasn't sure how to interpret today. Specifically, the line

$lookup -> {$chr} -> {$start} = $end
as I am not sure how multiple infix dereference operators work in series.

The input file contains tab-delimited chromosome names
, start positions
, and end positions
on each line. I get that the author is creating a hash table where
maps to arrays with
values corresponding to each chromosome, but I can't establish exactly what he is trying to accomplish with the next line. Any insight would be much appreciated.

my $hash;
my $lookup;
if (defined $bed_file) {
open(FILE, $bed_file);
while (my $line = <FILE>) {
chomp $line;
my ($chr, $start, $end) = split(/\t/, $line);
push(@{$hash -> {$chr}}, $start);
$lookup -> {$chr} -> {$start} = $end;

Answer Source
$lookup -> {$chr} -> {$start} = $end

$lookup is (being treated as) a pointer to a hash of hashes. $chr is the first level key, the value is another hash pointer. $start is the second level key, whose value is $end.

This code is relying on autovivification. Although $lookup is never initialized to anything, when working with pointers in Perl, if you pretend/believe that a structure exists, it exists. Ditto for the $hash variable (a hash of arrays.)

Another Perl feature, not employed here, is arrow collapsing such that arrows between indexes (of either sort) are optional. So this code can also read:

$lookup->{$chr}{$start} = $end

possibly better revealing the two level hash structure.

$lookup and $hash at the top level are parallel hashes, in that their first level keys are the same. The $hash structure appears to be an optimization as it could be computed from $lookup:




the difference being that $hash would preserve the file order of the $start values and $lookup would not.