syker syker - 1 month ago 9
Perl Question

What is the easiest way to get a key with the highest value from a hash in Perl?

What is the easiest way to get a key with the highest value from a hash in Perl?

Answer

While the solution with sort:

(sort {$hash{$a} <=> $hash{$b}} keys %hash)[0]

found in some of the other answers is quite elegant, it doesn't perform as nicely as it looks. First off, the sort transforms an O(n) search search operation into an O(n log n) one. Secondly, the sort solution has n log n hash look-ups. Hash look-ups are very good for certain operations, but when working with the entire hash, look-ups will be slower than using each, keys, or values to iterate through the data structure. This is because the iterators do not need to calculate the hashes of keys, nor do they need to repeatedly walk through bins to find the values. And the overhead is not constant, but increasing as the hashes get larger.

Here are a few faster solutions:

use strict;
use warnings;

my %hash = (
    small   => 1,
    medium  => 5,
    largest => 10,
    large   => 8,
    tiny    => 0.1,
);

Here is a solution using the each iterator (an O(1) operation done n times):

sub largest_value (\%) {
    my $hash = shift;
    keys %$hash;       # reset the each iterator

    my ($large_key, $large_val) = each %$hash;

    while (my ($key, $val) = each %$hash) {
        if ($val > $large_val) {
            $large_val = $val;
            $large_key = $key;
        }
    }
    $large_key
}

print largest_value %hash; # prints 'largest'

Or a faster version that trades memory for speed (it makes a copy of the hash):

sub largest_value_mem (\%) {
    my $hash   = shift;
    my ($key, @keys) = keys   %$hash;
    my ($big, @vals) = values %$hash;

    for (0 .. $#keys) {
        if ($vals[$_] > $big) {
            $big = $vals[$_];
            $key = $keys[$_];
        }
    }
    $key
}

print largest_value_mem %hash; # prints 'largest'

Here is the performance with various hash sizes:

10 keys:              Rate largest_with_sort largest_value largest_value_mem
largest_with_sort 111565/s                --           -8%              -13%
largest_value     121743/s                9%            --               -5%
largest_value_mem 127783/s               15%            5%                --

50 keys:             Rate  largest_with_sort largest_value largest_value_mem
largest_with_sort 24912/s                 --          -37%              -40%
largest_value     39361/s                58%            --               -6%
largest_value_mem 41810/s                68%            6%                --

100 keys:            Rate  largest_with_sort largest_value largest_value_mem
largest_with_sort  9894/s                 --          -50%              -56%
largest_value     19680/s                99%            --              -12%
largest_value_mem 22371/s               126%           14%                --

1,000 keys:         Rate   largest_with_sort largest_value largest_value_mem
largest_with_sort  668/s                  --          -69%              -71%
largest_value     2183/s                227%            --               -7%
largest_value_mem 2341/s                250%            7%                --

10,000 keys:        Rate   largest_with_sort largest_value largest_value_mem
largest_with_sort 46.5/s                  --          -79%              -81%
largest_value      216/s                365%            --              -11%
largest_value_mem  242/s                421%           12%                --

As you can see, if memory isn't much of an issue, the version with internal arrays is fastest, closely followed by the each iterator, and in a distant third... sort