bjd2385 bjd2385 - 4 months ago 9
Perl Question

Interpreting "timethese" results for search algorithms



I'm a little confused by the following results and I was hoping some of you might be able to shed some light on why linear search appears to be faster than both binary and interpolation in Perl.

Benchmark: timing 1000000 iterations of Binary, Interpolation, Linear...
Binary: 17 wallclock secs (16.33 usr + 0.00 sys = 16.33 CPU) @ 61236.99/s (n=1000000)
Interpolation: 4 wallclock secs ( 3.65 usr + 0.00 sys = 3.65 CPU) @ 273972.60/s (n=1000000)
Linear: 2 wallclock secs ( 1.52 usr + 0.00 sys = 1.52 CPU) @ 657894.74/s (n=1000000)


Each of the functions are below. I'm trying to write a bunch of well known algorithms and following along in Mastering Algorithms with Perl.

sub LinearSearch {
# Search linearly for a value
my $val = $_[0];
my $arrptr = $_[1];

for (my $i=0; $i<ARR_LENGTH; ++$i) {
if ($arrptr->[$i] == $val) {
return $i;
}
}

return -1;
}


sub BinarySearch {
my $val = $_[0];
my $arrptr = $_[1];

my $low = 0;
my $high = ARR_LENGTH; # to be modified

while ($low <= $high) {
my $middle = int(($low + $high) / 2);
my $midValue = $arrptr->[$middle];

if ($midValue < $val) {
$low = $middle + 1;
} elsif ($midValue > $val) {
$high = $middle - 1;
} else {
return $middle;
}
}

return -1;
}


sub InterpolationSearch {
my $val = $_[0];
my $arrptr = $_[1];

my $low = 0;
my $high = ARR_LENGTH; # to be modified

while ($val >= $arrptr->[$low] && $val <= $arrptr->[$high]) {
# solve for the middle value again
my $middle = int($low + ($high - $low)*(($val - @{$arrptr}[$low])
/ (@{$arrptr}[$high] - @{$arrptr}[$low] + 1)));

my $middleVal = $arrptr->[$middle];

if ($middleVal < $val) {
$low = $middle + 1;
} elsif ($middleVal > $val) {
$high = $middle - 1;
} else {
return $middle;
}
}
return -1; # Not found
}


Additionally,
ARR_LENGTH
is defined as

use constant ARR_LENGTH => 10_000;


at the outset. It's really just odd that a binary search would take so long, and then interpolation less so, but still twice as long as Linear search.




Benchmarking code (just what I found online):

my @array = OrderedArray();
my $random_val = $array[int(rand(ARR_LENGTH))];
timethese(1_000_000, {
Interpolation => 'InterpolationSearch($random_val, \@array)',
Binary => 'BinarySearch($random_val, \@array)',
Linear => 'LinearSearch($random_val, \@array)' }
);


where
OrderedArray()
is just a quick (probably unnecessary) function

sub OrderedArray {
# Create a random ordered array
my @arr;

for (my $i=1; $i<=ARR_LENGTH; ++$i) {
push @arr, $i;
}

return @arr;
}

Answer

When you pass a string to timethese instead of a sub ref, your @array and $random_val variables aren't in the scope where Benchmark evals it. So it is not actually running with the data you specify.

Try running it as:

use Benchmark 'timethese';

use strict;
use warnings 'all';

use constant ARR_LENGTH => 10000;

my @array = OrderedArray();
my $random_val = $array[int(rand(ARR_LENGTH))];
timethese(
    -5,
    {
        'Interpolation' => sub { InterpolationSearch($random_val, \@array) },
        'Binary' => sub { BinarySearch($random_val, \@array) },
        'Linear' => sub { LinearSearch($random_val, \@array) },
    }
);

Enabling warnings reveals an error in InterpolationSearch. Enabling warnings and setting $random_val to ARR_LENGTH +1 reveals an error in BinarySearch. You might consider writing some test cases and validating your code before you worry about benchmarking.

You might prefer cmpthese to timethese; I don't find timethese output as helpful.

Comments