ZoCode ZoCode - 6 months ago 7
Perl Question

How to sort a file by a 3rd column by using refernce in PERL

Hello guys so i have to sort a file about 10k lines i wrote this code but it take a lot of time to finish i asked someone and he told to use reference it wont take that much of time but i cant figured out where to use them this is what i did in perl :

use strict;
use warnings;

open( IN, "dico_corpus.dic" ) or die "$!";
my @tab;
my $i;
my @tabs;
my $c;
my @tabs2;
$i = 0;
$c = 0;
@tab = <IN>;
#here i will read line buy line and put the 3rd colmun(which i want to sort in tabs2)
for ( $i = 0; $i < $#tab; $i++ ) {
@tabs = split( /\s+/, $tab[$i] );

$tabs2[$c] = $tabs[2];

$c++;

}// here tabs2 contain the 3rd colmun to sort


@tabs2 = sort(@tabs2);

open( OUT, ">>resultat.txt" );# to print result by adding line by line to resultat.txt

foreach my $word (@tabs2) {# here i will take the first value in tabs2
# and get the first line from the original file
# and test the 3rd colmun if its the same so i
# print the whole line if its not so to the next
#line

foreach my $var (@tab) {
@tabs = split( /\s+/, $var );

if ( $word eq $tabs[2] ) {
my $ligne = join( "\t", $tabs[1], $tabs[0], $tabs[2] );
print OUT $ligne, "\n";
}
}
}

close(IN);
close(OUT);



some lines from the original file

3851 4178 de

1972 6643 la

1391 2246 à

1098 5163 et

656 8429 que

Answer

Indeed, the Schwartzian transform (ST) @toto refers to can be used here. But I presume it may seem a bit obscure to you and I'd like to show a more explicit solution. This will be slower than the ST but might be easier to read for beginners.

The first block simply reads the complete input file into the array @lines. I used the recommended 3-parameter-open. See Perl's tutorial on open for details.

Perl has a built-in sort function which sorts a list (or array) lexicographically (i.e. ('c', 'a', 'b') → ('a', 'b', 'c')). If that doesn't suit your needs you can also supply a custom comparison function like I did here with by_third_column. This function gets called with the magic parameters $a and $b. These are the items that will be compared. In your case $a and $b are some (arbitrary) complete lines of your input and the function has to decide which line is "greater".

So the function by_third_column splits the two given lines at whitespace apart and picks the 3rd items ("fields") of these lines. This is the my $a3 = … and my $b3 = … part. Then these 3rd fields are lexicographically compared ($a3 cmp $b3).

Finally we call sort on the @lines array but supply that custom compare function. The last block simply outputs (appends) the sorted output to the file 'resultat.txt'.

#!/usr/bin/env perl

use strict;
use warnings;

open( my $in, '<', 'dico_corpus.dic' ) or die "$!";
my @lines = <$in>;
close($in);

sub by_third_column
{
    my $a3 = ( split /\s+/, $a )[2];
    my $b3 = ( split /\s+/, $b )[2];
    return $a3 cmp $b3;
}

my @sorted = sort by_third_column @lines;

open( my $out, '>>', 'resultat.txt' ) or die "$!";
print $out @sorted;
close($out);
Comments