ryvnf ryvnf - 3 months ago 12
C Question

Should I use "rand % N" or "rand() / (RAND_MAX / N + 1)"?

I was reading the C FAQ and found out in a question that it recommends me to use

rand() / (RAND_MAX / N + 1)
instead of the more popular way which is
rand() % N
.

The reasoning for that is that when
N
is a low number
rand() % N
will only use a few bits from
rand()
.

I tested the different approaches with
N
being
2
on both Windows and Linux but could not notice a difference.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define N 2

int main(void)
{
srand(0);
printf("rand() %% N:\n");
for (int i = 0; i < 40; ++i) {
printf("%d ", rand() % N);
}
putchar('\n');

srand(0);
printf("rand() / (RAND_MAX / N + 1):\n");
for (int i = 0; i < 40; ++i) {
printf("%d ", rand() / (RAND_MAX / N + 1));
}
putchar('\n');

return 0;
}


The output is this (on my gnu/linux machine):

rand() % N:
1 0 1 1 1 1 0 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0 1 1 0 0 0 1 1 1 1 0 0 0 1 1 1 0 1 0
rand() / (RAND_MAX / N + 1):
1 0 1 1 1 0 0 1 0 1 0 1 0 1 1 1 1 1 0 1 0 0 0 1 0 0 0 0 1 0 1 1 1 0 1 1 0 1 0 1


Both alternatives seem perfectly random to me. It even seems like the second approach is worse than
rand % N
.

Should I use
rand() % N
or
rand() / (RAND_MAX / N + 1)
?

Answer

If N is a power of two, using the remainder technique is usually safe (RAND_MAX is usually a power of two minus 1, so the entire range has a power of two length). More generally, N has to divide the range of rand() in order to avoid the bias.

Otherwise, you run into this problem, regardless of the quality of rand(). In short, the problem is that you're chopping that range into a number of "parts" each of length N, if N does not divide the range then the last part will not be complete. The numbers that got "cut off" from that part are therefore less likely to occur, since they have one fewer "part" they can be generated from.

Unfortunately rand() / (RAND_MAX / N + 1) is also broken (in almost the same way), so the real answer is: don't use either of them.

The problem as outlined above is really fundamental, there is no way to evenly distribute X different values over Y results unless Y divides X. You can fix it by rejecting a part of the random samples, to make Y divide the new X.