bjd2385 - 1 year ago 65
Perl Question

# Interpreting "timethese" results for search algorithms

I'm a little confused by the following results and I was hoping some of you might be able to shed some light on why linear search appears to be faster than both binary and interpolation in Perl.

``````Benchmark: timing 1000000 iterations of Binary, Interpolation, Linear...
Binary: 17 wallclock secs (16.33 usr +  0.00 sys = 16.33 CPU) @ 61236.99/s (n=1000000)
Interpolation:  4 wallclock secs ( 3.65 usr +  0.00 sys =  3.65 CPU) @ 273972.60/s (n=1000000)
Linear:  2 wallclock secs ( 1.52 usr +  0.00 sys =  1.52 CPU) @ 657894.74/s (n=1000000)
``````

Each of the functions are below. I'm trying to write a bunch of well known algorithms and following along in Mastering Algorithms with Perl.

``````sub LinearSearch {
# Search linearly for a value
my \$val = \$_[0];
my \$arrptr = \$_[1];

for (my \$i=0; \$i<ARR_LENGTH; ++\$i) {
if (\$arrptr->[\$i] == \$val) {
return \$i;
}
}

return -1;
}

sub BinarySearch {
my \$val = \$_[0];
my \$arrptr = \$_[1];

my \$low = 0;
my \$high = ARR_LENGTH;  # to be modified

while (\$low <= \$high) {
my \$middle = int((\$low + \$high) / 2);
my \$midValue = \$arrptr->[\$middle];

if (\$midValue < \$val) {
\$low = \$middle + 1;
} elsif (\$midValue > \$val) {
\$high = \$middle - 1;
} else {
return \$middle;
}
}

return -1;
}

sub InterpolationSearch {
my \$val = \$_[0];
my \$arrptr = \$_[1];

my \$low = 0;
my \$high = ARR_LENGTH;  # to be modified

while (\$val >= \$arrptr->[\$low] && \$val <= \$arrptr->[\$high]) {
# solve for the middle value again
my \$middle = int(\$low + (\$high - \$low)*((\$val - @{\$arrptr}[\$low])
/ (@{\$arrptr}[\$high] - @{\$arrptr}[\$low] + 1)));

my \$middleVal = \$arrptr->[\$middle];

if (\$middleVal < \$val) {
\$low = \$middle + 1;
} elsif (\$middleVal > \$val) {
\$high = \$middle - 1;
} else {
return \$middle;
}
}
}
``````

`ARR_LENGTH`
is defined as

``````use constant ARR_LENGTH => 10_000;
``````

at the outset. It's really just odd that a binary search would take so long, and then interpolation less so, but still twice as long as Linear search.

Benchmarking code (just what I found online):

``````my @array = OrderedArray();
my \$random_val = \$array[int(rand(ARR_LENGTH))];
timethese(1_000_000, {
Interpolation => 'InterpolationSearch(\$random_val, \@array)',
Binary        => 'BinarySearch(\$random_val, \@array)',
Linear        => 'LinearSearch(\$random_val, \@array)' }
);
``````

where
`OrderedArray()`
is just a quick (probably unnecessary) function

``````sub OrderedArray {
# Create a random ordered array
my @arr;

for (my \$i=1; \$i<=ARR_LENGTH; ++\$i) {
push @arr, \$i;
}

return @arr;
}
``````

When you pass a string to timethese instead of a sub ref, your @array and \$random_val variables aren't in the scope where Benchmark evals it. So it is not actually running with the data you specify.

Try running it as:

``````use Benchmark 'timethese';

use strict;
use warnings 'all';

use constant ARR_LENGTH => 10000;

my @array = OrderedArray();
my \$random_val = \$array[int(rand(ARR_LENGTH))];
timethese(
-5,
{
'Interpolation' => sub { InterpolationSearch(\$random_val, \@array) },
'Binary' => sub { BinarySearch(\$random_val, \@array) },
'Linear' => sub { LinearSearch(\$random_val, \@array) },
}
);
``````

Enabling warnings reveals an error in InterpolationSearch. Enabling warnings and setting \$random_val to ARR_LENGTH +1 reveals an error in BinarySearch. You might consider writing some test cases and validating your code before you worry about benchmarking.

You might prefer cmpthese to timethese; I don't find timethese output as helpful.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download