hedgehog hedgehog - 7 months ago 18
Bash Question

which is the fastest way to print in awk

I am trying to make some measurements, and i would like to know what is the fastest way to print something through

nawk
.
at the moment i use
printf ARR[2] " ";
, but it seems to take more time than normal to print.

Info: I am printing around 500 numbers and adding the space in the
printf
so that not everything would be stucked together in the print out. Also i am running the script on ksh, in unix oracle solaris.

Like this, it needs around 14 seconds to print everything, is there any faster way i could do this?

Thanks in Advance!

UPDATE

The function that i care about is awkfun, in whuch i use
time
when i call it in order to make my time measurements.
Think of
NUMBERS
as a variable that holds 1000 random numbers, and
XNUMBERS
a variable that holds 1000 random number but in this format,
123|321
, so it takes the random number reverces it and adds a
|
in the middle.
I am checking for each of
NUMBERS
if it exhists in
XNUMBERS
and if it exhists i am printing out only the reversed number.

numfun() {
NUMBERS=`nawk ' BEGIN{
srand();
for (i=0; i<=999; i++) {
printf("%s\n", 100 + int(rand() * (899)));
}
}'`
}
numfun
sleep 1
xnumfun() {
XNUMBERS=`nawk ' BEGIN{
srand();
for (i=0; i<=999; i++) {
XNUMBERS[i]= 100 + int(rand() * (899));
}
for (i=0; i<=999; i++) {
ver=XNUMBERS[i] "";
rev = "";
for (q=length(ver); q!=0; q--) {
rev = rev substr(ver, q, 1);
}
printf("%s\n", XNUMBERS[i] "|" rev );
}
}'`
}
xnumfun
awkfun() {
for n in $NUMBERS
do
echo "${XNUMBERS}" | nawk -v VAR=$n '
{
split($1,ARR,"|")
if (VAR == ARR[1]){
printf ARR[2] " ";
exit;
}
}'
done

}
shellfun() {
for n in $NUMBERS
do
for x in $XNUMBERS
do
if test "$n" -eq "${x%%\|*}"
then
echo "${x##*\|}";
break;
fi
continue;
done
done
}
sleep 1
time awkfun;
echo "\nAWK TIME\n\n-----------------------------";
time shellfun;
echo "\nSHELL TIME\n\n-----------------------------";
time numfun;
echo "\nNUMBERS TIME\n\n-----------------------------";
time xnumfun;
echo "\nXNUMBERS TIME\n\n-----------------------------\n\nTOTAL TIME\n";

Answer

The reason your program is slow is not because of printing. Your program is slow because you invoke a new copy of nawk for every element of $NUMBERS. This is very wasteful and you should rethink your program design from the beginning. It appears you are mostly trying to see which numbers from one list exist in a second list. If you want to do this in nawk, you should read the entire first list first, and store the elements in an associative array before reading each number from the second file.

You could probably solve this problem more cleanly using join or grep.


Edit: Here's a working solution using grep. It's at least 20x faster than your original shellfun().

shellfun2() {
    echo $XNUMBERS | tr ' ' '\n' | cut -d '|' -f1 \
        | grep -f <(echo $NUMBERS | tr ' ' '\n') | rev
}

The way it works is to take all the numbers from $XNUMBERS before the pipes (so 12|21 34|43 becomes 12\n34), then pipe those to grep with the -f argument being all of $NUMBERS. This means we search for all the left-hand sides of $XNUMBERS within $NUMBERS, and after printing the matches we simply use rev to reverse them. We don't need the right-hand sides of $XNUMBERS at all (so maybe you can even stop generating them in the first place, saving more time).


Edit: Since you've now told us you are running on Solaris instead of Linux, you don't have rev, so you can replace rev in the above with this:

sed '/\n/!G;s/\(.\)\(.*\n\)/&\2\1/;//D;s/.//'

And you can replace grep with /usr/xpg4/bin/grep to get an enhanced version that supports -f.