1 year ago
Python Question

Python number of rng seeds

In the docs (python 3.5) for initializing a random number from a seed:

random.seed(a=None, version=2)

Initialize the random number generator.

If a is omitted or None, the current system time is used. If
randomness sources are provided by the operating system, they are used
instead of the system time (see the os.urandom() function for details
on availability).

If a is an int, it is used directly.

With version 2 (the default), a str, bytes, or bytearray object gets
converted to an int and all of its bits are used. With version 1, the
hash() of a is used instead.

It does not make it clear how many seeds there are. An int has normally only 4 billion distinct values, but pythons include arbitrary precision:

x = 1
type(x) # <class 'int'>
y = 123456789123456789123456789123456789123456789123456789123456789123456789
type(y) # <class 'int'>
z = x+y
z-y # 1 (no rounding error for a 71 digit number)

They say all of its bits are used but that could mean that the bits are used to make a digest that is a normal 32 bit int. Why does this matter? I need a make random patterns from seeds. In turn, I need to makes random sequences of patterns (the sequence in turn has a seed). A stream of random number generators will be subject to a "birthday attack" in which after a 100 thousand or so there is almost certainly going to be a duplicate if it's only 32 bits. Although this is not for cryptography, it's still undesirable.

Answer Source

What's great about open source is the ability to simply view the code with a question. This is the source of random.seed:

if a is None:
        # Seed with enough bytes to span the 19937 bit
        # state space for the Mersenne Twister
        a = int.from_bytes(_urandom(2500), 'big')
    except NotImplementedError:
        import time
        a = int(time.time() * 256) # use fractional seconds

if version == 2:
    if isinstance(a, (str, bytes, bytearray)):
        if isinstance(a, str):
            a = a.encode()
        a += _sha512(a).digest()
        a = int.from_bytes(a, 'big')

self.gauss_next = None

You can see that if version == 2 and str/bytes are provided, it takes the SHA512 of a, appends it, and uses int.from_bytes, generating a very large int and guaranteeing at least a 512-bit seed, even with very small custom inputs.

As noted below, the end result is that the seed is guaranteed to have a length of at least 624 bits.