I need to have huge boolean array. All values should be initialized as "True":
arr = [True] * (10 ** 9)
arr = bytearray(10 ** 9) # initialized with zeroes
>>> from timeit import timeit
>>> def f1():
... return bytearray(10**9)
>>> def f2():
... return bytearray(b'\x01'*(10**9))
>>> timeit(f1, number=100)
>>> timeit(f2, number=100)
Consider using NumPy for this sort of thing. On my computer,
np.ones (which initializes an array of all-1 values) with boolean "dtype" is just as fast as the bare
>>> import numpy as np >>> from timeit import timeit >>> def f1(): return bytearray(10**9) >>> def f2(): return np.ones(10**9, dtype=np.bool) >>> timeit(f1, number=100) 24.9679438900057 >>> timeit(f2, number=100) 24.732190757000353
If you don't want to use third-party modules, another option with competitive performance is to create a one-element
bytearray and then expand that, instead of creating a large byte-string and converting it to a bytearray.
>>> def f3(): return bytearray(b'\x01')*(10**9) >>> timeit(f3, number=100) 24.842667759003234
Since my computer appears to be slower than yours, here is the performance of your original option for comparison:
>>> def fX(): return bytearray(b'\x01'*(10**9)) >>> timeit(fX, number=100) 56.61828187300125
Cost in all cases is going to be dominated by allocating a decimal gigabyte of RAM and writing to every byte of it.
fX is roughly twice as slow as the other three functions because it has to do this twice. A good rule of thumb for you to remember when working with code like this is: minimize the number of allocations. It may be worth dropping down to a lower-level language in which you can explicitly control allocation (if you don't know any such language already, I recommend Rust).