Gregory Nisbet Gregory Nisbet - 7 months ago 11
Perl Question

for loop doesn't modify `my` variable but does modify `our` variable

In Perl 5.20, a for loop seems to be able to modify a module-scoped variable but not a lexical variable in a parent scope (and doesn't introduce a new scope if

#!/usr/bin/env perl
use strict;
use warnings;

our $x;

sub print_func {
print "$x\n";
}

for $x (1 .. 10) {
print_func;
}


prints 1 through 10 like you would expect, but the following does not:

#!/usr/bin/env perl
use strict;
use warnings;

my $x;

sub print_func {
print "$x\n";
}

for $x (1 .. 10) {
print_func;
}


emits the following warning 10 times:

Use of uninitialized value $x in concatenation (.) or string at perl-scoping.pl line 8.


What's going on here? I know that perl subroutines cannot be nested (and always have module scope) and therefore it seems logical that they wouldn't be able to close over
my
variables. It seems like in that case, perl in
strict
mode should reject the second program with a message like the following:

Global symbol "$x" requires explicit package name at perl-scoping.pl line 6.
Global symbol "$x" requires explicit package name at perl-scoping.pl line 9.


I.e. it should reject subroutine because the free variable isn't declared anywhere and the for loop because the variable hasn't been declared.

Why is Perl behaving this way?

Answer

It's confusing, but documented, behavior probably stemming from the bad decision to make the loop iterator variable an implicit localized global rather than a lexical. From Foreach Loops in perlsyn.

If the variable is preceded with the keyword my, then it is lexically scoped, and is therefore visible only within the loop. Otherwise, the variable is implicitly local to the loop and regains its former value upon exiting the loop. If the variable was previously declared with my, it uses that variable instead of the global one, but it's still localized to the loop.

To put it another way, the loop iterator is always localized to the loop. If it's a global then it acts like it's been declared local inside the loop block. If it's a lexical, then it acts like it's been declared with my inside the loop block.

Applying this to your two examples will help understand what's going on.

our $x;

sub print_func {
    print "$x\n";
}

for $x (1 .. 10) {
    print_func; 
}

There's an implicit local $x on that loop. local really should have been named temp. It temporarily overrides the value of a global variable for the duration of its scope, but it's still a global. That's why print_func can see it.

The old value is restored when its scope ends. You can see this if you add a print $x after the for loop.

use v5.10;

our $x = 42;

for $x (1 .. 10) {
    say $x;
}

say $x;  # 42

Let's look at your code involving lexicals (my variables).

my $x;

sub print_func {
    print "$x\n";
}

for $x (1 .. 10) {
    print_func; 
}

What's really happening here is you have two lexical variables both called $x. One is file scoped, one is scoped to the loop. The inner $x on the for loop takes precedent over the outer $x. This is known as "shadowing".

Lexicals cannot be seen outside their physical scope. print_func() only sees the outer uninitialized $x.


There's some stylistic takeaways from this.

Always pass parameters into your functions.

In reality, print_func should take an argument. Then you don't have to worry about complicated scoping rules.

sub print_func {
    my $arg = shift;
    print "$arg\n";
}

for $x (1..10) {
    print_func($x);
}

Always use for my $x.

Don't rely on the complicated implicit for loop scoping rules. Always declare the loop iterator with my.

for my $x (1..10) {
    print_func($x);
}

Avoid globals.

Since it's hard to tell what's accessing a global, don't use them. If you ever think you need a global, write a function instead to control access to a file scoped lexical.

my $Thing = 42;
sub get_thing { return $Thing }
sub set_thing { $Thing = shift; return }

Declare your variables close to where they're used.

Ye olde coding styles will do things like declare all their variables at the top of the file or function. This is a hold over from very, very, very old languages which required that variables be declared only in certain places. Perl, and most modern languages, have no such restriction.

If you declare your variables all at once it's hard to know what they're for, and it's hard to know what's using or affecting it. If you declare it close to its first use that limits what can affect it, and makes it more obvious what its for.