n0pe n0pe - 8 months ago 18
Perl Question

Trying to understand this perl script

It seems very simple and I figured most of it out. But seeing as perl is loose with syntax, it's difficult for a new comer to jump right in :)

my @unique = ();
my %seen = ();
foreach my $elem ( @array ) {
next if $seen{ $elem }++;
push @unique, $elem;

This is right from the perldoc website. If I understand correctly, it can also be written as:

my @unique = ();
my %seen = ();
my $elem;
foreach $elem ( @array ) {
if ( $seen{ $elem }++ ) {
push ( @unique, $elem );

So my understanding at this point is:

  • Declare an array named unique

  • Declare a hash named seen

  • Declare a variable named elem

  • Iterate over @array, each iteration is stored in $elem

  • If $elem is a key in the hash %seen (I have no idea what the
    does), skip to the next iteration

  • Append $elem to the end of @unique

I'm missing 2 things:

  • When does anything get stored in %seen?

  • What does ++ do (in every other language it increments, but I dont see how that works)

I know that the issue lies with this part:

$seen{ $elem }++

which I suspect is doing a bunch of different stuff at once. Is there a simpler more verbose way of writing that line?

Thanks for the help


The ++ operator does essentially the same thing in Perl as it does in most other languages that have it: it increments a variable.

$seen{ $elem }++;

increments a value in the %seen has, namely the one whose key is $elem.

The "magic" is that if $seen{$elem} hasn't been defined yet, it's automatically created, as if it already existed and had the value 0; the ++ then sets it to 1. So it's equivalent to:

if (! exists $seen{$elem}) {
    $seen{$elem} = 0;
$seen{$elem} ++;

This is called "autovivification". (No, really, that's what it's called.) (EDIT2: No, my mistake, it's not; as @ysth points out, "autovification" actually refers to references springing into existence. See perldoc perlref.)

EDIT: Here's a revised version of your description:

  • Declare an array variable named @unique
  • Declare a hash variable named %seen
  • Declare a scalar variable named $elem
  • Iterate over @array, each iteration is stored in $elem
  • If $elem is a key in the hash %seen, skip to the next iteration
  • Append the value of $elem to the end of @unique

@unique, %seen, and $elem are all variables. The punctuation character (known as the "sigil" indicates what kind of variable each of them is, and is best thought of as part of the name.