aschultz aschultz - 2 months ago 8
Perl Question

Where to define local temp variables in Perl subroutine?

I took too long to use warnings; and strict; in Perl, but now I did, I see the advantages.

One of the things I'm still not sure about is when to define a temporary variable. This may seem like a trivial thing, but I run a lot of Monte Carlo simulations where losing a bit of time adds up over 10000+ iterations. I've been lazy about using strict/warnings on quicker simulations, but they've gotten more complex, so I really need to.

So (cutting out code to calculate stuff) I am wondering if

sub doStuff
{
my $temp;
for my $x (1..50)
{
$temp = $x**2;
}
for my $x (1..50)
{
$temp = $x**3;
}
}


Or

sub doStuff
{
for my $x (1..50)
{
my $temp = $x**2;
}
for my $x (1..50)
{
my $temp = $x**3;
}
}


Is less/more efficient, or if one violates some Perl coding I didn't know yet.

Answer

The efficiency between these two is close enough, and it is dwarfed by any realistic processing. So I'd go by code -- if the $tmp is indeed temporary and unneeded after the loop then it is better to keep it inside (scoped), for all the other reasons.

Since this is about optimization I'd like to digress. Such micro-issues may have an effect. However, where you really gain is first at the level of algorithms, and then by choosing appropriate data structures and techniques. The low-level tweaks are the very last thing to think about, and there are often language features and libraries that render them irrelevant. That said, one should know one's tool and not waste time around.

Also, there is often a trade-off between the code clarity and efficiency. If it comes to that I suggest to code for correctness and clarity. Then benchmark. Then optimize if needed, cautiously and gradually, and with a lot of testing in between.

Here is an example of basic use of the core module Benchmark. I throw in an additional operation and add other cases where there is no temporary.

use warnings 'all';
use strict;    
use Benchmark qw(cmpthese);

my $x;

sub tmp_in {
    for (1..10_000) {
        my $tmp = 2 * $_;
        $x = $tmp + $_;
    }
    return $x;
}

sub tmp_out {
    my $tmp;
    for (1..10_000) {
        $tmp = 2 * $_;
        $x = $tmp + $_;
    }
    return $x;
}

sub no_tmp {
    for (1..10_000) { $x = 2 * $_ + $_ }
    return $x;
}

sub base {
    for (1..10_000) { $x += $_ }
    return $x;
}

sub calc { 
    for (1..10_000) { $x += sin sqrt(rand()) }
    return $x;
}         

cmpthese(-10, {
    tmp_in  => sub { tmp_in  },
    tmp_out => sub { tmp_out },
    no_tmp  => sub { no_tmp  },
    base    => sub { base    },        
    calc    => sub { calc    },
});

Output (on v5.16)

          Rate    calc  tmp_in tmp_out  no_tmp    base
calc     623/s      --    -11%    -26%    -44%    -59%
tmp_in   698/s     12%      --    -17%    -37%    -54%
tmp_out  838/s     34%     20%      --    -25%    -44%
no_tmp  1117/s     79%     60%     33%      --    -26%
base    1510/s    142%    116%     80%     35%      --

So they differ, and apparently a declaration in a loop costs. But the tmp versions are together. Also, this is often just overhead so it is greatly exaggerated. And there are other aspects -- no_tmp runs in one statement, for example. These things may matter only if your processing is mostly iterations. Just generating a (high quality) pseudo-random number is expensive. Also, this may differ wildly across different hardware and software versions. My results with v5.10 on a better machine are a bit different. Replace the sample 'calculations' with your processing, and run on the actual hardware, for a relevant measure of whether it matters at all.

Comments