Ger Cas Ger Cas - 1 year ago 102
Perl Question

Load field 1 and print at the END{} equivalent awk in Perl

I have the following AWK script that counts occurences of elements in field 1 and when finishes to read entire file, prints each element and the times of repetitions.

awk '{a[$1]++} END{ for(i in a){print i"-->"a[i]} }' file


I'm very new with perl and I don't know how would be the equivalent. What I have so far is below, but it has incorrect syntax. Thanks in advance.

perl -lane '$a{$F[1]}++ END{foreach $a {print $a} }' file


____________________________________UPDATE
______________________________________

Hi, thanks both for your answers. The real input file has 34 million lines and the execution time is 3 or more times faster between awk and Perl. Is awk faster than perl?

awk '{a[$1]++}END{for(i in a){print i"-->"a[i]}}' file #--> 2:45 aprox
perl -lane '$a{$F[0]}++;END{foreach my $k (keys %a){ print "$k --> $a{$k}" } }' file #--> 7 min aprox
perl -lanE'$a{$F[0]}++; END { say "$_ => $a{$_}" for keys %a }' file # -->9 min aprox

Answer Source

Equivalent to your awk line

perl -lanE'$a{$F[0]}++; END { say "$_ => $a{$_}" for keys %a }' file

By -a the line is broken into fields in @F so you want $F[0] as a key in a hash %h with the value of the counter handled by ++. The hash is iterated over keys and printed in the END block.

However, the efficiency comparison comes up. A way to improve this is to not fetch all fields on the line, done with -a, since only the first one is needed. Between two ways that come to mind

perl -nE'$a{(/(\S+)/)[0]}++; END { ... }' 

and

perl -nE'$a{(split " ", $_, 2)[0]}++; END { ... }'

the split is significantly faster with its 3.63s vs 4.41s for regex, on a 8M-line file.

This is still behind 1.99s for your awk line. So it seems that awk is faster for this task.


Summary of my timings for an 8-million line file (average of a few runs)

awk  (question)  1.99s
perl (split)     3.63s
perl (regex)     4.41s
perl (like awk)  5.61s

These timings vary over runs by a few tens of miliseconds (0.01s).

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download