theGrayFox - 4 months ago 8x
Linux Question

# Shuffling with Awk

I have an array that needs to be shuffled and I would like to optimize the speed of this algorithm by using Awk. I'm still relatively new to using Awk and I was trying to figure out the best way of simulating this algorithm. How can this properly be done?

Bash Shuffle:

`````` shuffle() {

local size limit rand i

limit=\$(( 32768 / size * size))

for ((i=size-1; i > 0; i--)); do
while (((rand=\$RANDOM) >= limit)); do :; done
rand=\$((rand % (i+1)))
done
}
``````

Awk Attempt:

``````shuffle() {

local size limit rand i

limit=\$(( 32768 / size * size))

awk -v rand=\$RANDOM 'BEGIN {
srand(rand);
for(i=size-1; i>0; i--) {
while(rand >= limit);
rand=rand % i + 1;
}
}'
}
``````

awk has the `rand` function, which generates a random number between 0.0 and 1.0 (actually, strictly less than 1.0). To get a random integer in the range `[0, i+1)`, use `int(rand()*(i+1))`. I don't think `srand` does what you think it does; `srand` sets the "seed" for awk's random number generator, which avoids generating the same random number sequence every time you invoke `awk`. Normally, the seed is set from something which changes frequently, like the time -- although that's not ideal -- or a value extracted from `/dev/random`.

A couple of observations:

1) I understand that your loop

``````while (((rand=\$RANDOM) >= limit)); do :; done
``````

is attempting to avoid bias in the random number generated by `\$RANDOM`, since that number only has 16 bits and thus the bias might be noticeable. However, it will only avoid bias the first time through the loop, when `i+1 == size`, since `limit` was computed based on `size`. After that, `limit` will be the wrong value. You could improve the computation, or you could generate a random number with more random bits by using `/dev/urandom`, but personally I'd just use the `shuf` utility, which does what you want (randomly shuffles an input). Of course, that's more pragmatic than didactic; it's not as educational as writing your own shuffler.

2) That applies equally to the `awk` solution (i.e., why use `awk` when you can use `shuf`?). But in any event, assigning the `awk` variable `rand` from the `bash` magical `\$RANDOM` does not make `rand` magical. `awk` and `bash` aren't linked in any way. (And you can't just use bash variables like `limit` in an awk script either.)

Source (Stackoverflow)