user3639557 user3639557 - 2 months ago 19
Bash Question

how rand() works in awk

I am trying to sample the 2nd column of a csv file (any number of samples is fine) using

awk
and
rand()
. But, I noticed that I always end up with the same number of samples

cat toy.txt | awk -F',' 'rand()<0.2 {print $2}' | wc -l


I explored and it seems
rand()
is not working as I expected. For example, a in the following seems to always be 1,

cat toy.txt | awk -F',' 'a=rand() a<0.2 {print a}'


Why?

Answer Source

From the documentation:

CAUTION: In most awk implementations, including gawk, rand() starts generating numbers from the same starting number, or seed, each time you run awk. Thus, a program generates the same results each time you run it. The numbers are random within one awk run but predictable from run to run. This is convenient for debugging, but if you want a program to do different things each time it is used, you must change the seed to a value that is different in each run. To do this, use srand().