Rishi Rishi - 3 months ago 8
Perl Question

How to avoid loading of file into memory while sorting in perl

i have tried with some script for sorting a input text file in descending order and print only top usage customer.

input text file containts:

NAME,USAGE,IP
example :
Abc,556,10.2.3.5
bbc,126,14.2.5.6


and so on, this is very large file and i am trying to avoid loading file into memory.

I have tried with following script.

use warnings ;

use strict;

my %hash = ();
my $file = $ARGV[0] ;


open (my $fh, "<", $file) or die "Can't open the file $file: ";

while (my $line =<$fh>)
{
chomp ($line) ;
my( $name,$key,$ip) = split /,/, $line;

$hash{$key} = [ $name, $ip ];
}

my $count= 0 ;

foreach ( sort { $b <=> $a } keys %hash ){
my $value = $hash{$_};
print "$_ @{$value} \n" ;
last if (++$count == 5);
}


Output should be sorted based on usage and it will show the name and IP for respective usage. " `

Answer

Algorithm remembering the last 5 biggest rows.

For each row, check with the lowest memorized element. If more - are stored in the array before next biggest item with unshift lowest.

use warnings;
use strict;

my $file = $ARGV[0] ;
my @keys=(0,0,0,0,0);
my @res;

open (my $fh, "<", $file) or die "Can't open the file $file: ";
while(<$fh>)
{
 my($name,$key,$ip) = split /,/;
 next if($key<$keys[0]);
 for(0..4) {
  if($_==4 || $key<$keys[$_+1]) {
   @keys[0..$_-1]=@keys[1..$_] if($_>0);
   $keys[$_]=$key;
   $res[$_]=[ $name, $ip ];
   last;
  } 
 }
}
for(0..4) {
  print "$keys[4-$_]  @{$res[4-$_]}";
}

Test on file from 1M random rows (20 Mbytes):

Last items (This algorithm):
Start 1472567980.91183
End   1472567981.94729 (duration 1.03546 seconds)

full sort in memory (Algorithm of @Rishi):
Start 1472568441.00438
End   1472568443.43829 (duration 2.43391 seconds)

sort by parts of 100 rows (Algorithm of @Borodin):
Start 1472568185.21896
End   1472568195.59322 (duration 10.37426 seconds)
Comments