theGrayFox theGrayFox - 6 months ago 17
Linux Question

Shuffling with Awk

I have an array that needs to be shuffled and I would like to optimize the speed of this algorithm by using Awk. I'm still relatively new to using Awk and I was trying to figure out the best way of simulating this algorithm. How can this properly be done?

Bash Shuffle:

shuffle() {

local size limit rand i

size=${#password[*]}
limit=$(( 32768 / size * size))

for ((i=size-1; i > 0; i--)); do
while (((rand=$RANDOM) >= limit)); do :; done
rand=$((rand % (i+1)))
tmp=${password[i]}
password[i]=${password[rand]}
password[rand]=$tmp
done
}


Awk Attempt:

shuffle() {

local size limit rand i

size=${#password[*]}
limit=$(( 32768 / size * size))

awk -v rand=$RANDOM 'BEGIN {
srand(rand);
for(i=size-1; i>0; i--) {
while(rand >= limit);
rand=rand % i + 1;
tmp=password[i];
password[i]=password[rand];
password[rand]=tmp;
}
}'
}

Answer

awk has the rand function, which generates a random number between 0.0 and 1.0 (actually, strictly less than 1.0). To get a random integer in the range [0, i+1), use int(rand()*(i+1)). I don't think srand does what you think it does; srand sets the "seed" for awk's random number generator, which avoids generating the same random number sequence every time you invoke awk. Normally, the seed is set from something which changes frequently, like the time -- although that's not ideal -- or a value extracted from /dev/random.

A couple of observations:

1) I understand that your loop

while (((rand=$RANDOM) >= limit)); do :; done

is attempting to avoid bias in the random number generated by $RANDOM, since that number only has 16 bits and thus the bias might be noticeable. However, it will only avoid bias the first time through the loop, when i+1 == size, since limit was computed based on size. After that, limit will be the wrong value. You could improve the computation, or you could generate a random number with more random bits by using /dev/urandom, but personally I'd just use the shuf utility, which does what you want (randomly shuffles an input). Of course, that's more pragmatic than didactic; it's not as educational as writing your own shuffler.

2) That applies equally to the awk solution (i.e., why use awk when you can use shuf?). But in any event, assigning the awk variable rand from the bash magical $RANDOM does not make rand magical. awk and bash aren't linked in any way. (And you can't just use bash variables like limit in an awk script either.)