DJV DJV - 3 days ago 4
Python Question

Is it possible to effectively initialize bytearray with non-zero value?

I need to have huge boolean array. All values should be initialized as "True":

arr = [True] * (10 ** 9)


But created as above it takes too much memory. So I decided to use
bytearray
for that:

arr = bytearray(10 ** 9) # initialized with zeroes


Is it possible to initialize
bytearray
with
b'\x01'
as effectively as it is initialized by
b'\x00'
?

I understand I could initialize
bytearray
with zeros and inverse my logic. But I'd prefer not to do that if possible.

timeit:

>>> from timeit import timeit
>>> def f1():
... return bytearray(10**9)
...
>>> def f2():
... return bytearray(b'\x01'*(10**9))
...
>>> timeit(f1, number=100)
14.117428014000325
>>> timeit(f2, number=100)
51.42543800899875

Answer

Consider using NumPy for this sort of thing. On my computer, np.ones (which initializes an array of all-1 values) with boolean "dtype" is just as fast as the bare bytearray constructor:

>>> import numpy as np
>>> from timeit import timeit
>>> def f1(): return bytearray(10**9)
>>> def f2(): return np.ones(10**9, dtype=np.bool)
>>> timeit(f1, number=100)
24.9679438900057
>>> timeit(f2, number=100)
24.732190757000353

If you don't want to use third-party modules, another option with competitive performance is to create a one-element bytearray and then expand that, instead of creating a large byte-string and converting it to a bytearray.

>>> def f3(): return bytearray(b'\x01')*(10**9)
>>> timeit(f3, number=100)
24.842667759003234

Since my computer appears to be slower than yours, here is the performance of your original option for comparison:

>>> def fX(): return bytearray(b'\x01'*(10**9))
>>> timeit(fX, number=100)
56.61828187300125

Cost in all cases is going to be dominated by allocating a decimal gigabyte of RAM and writing to every byte of it. fX is roughly twice as slow as the other three functions because it has to do this twice. A good rule of thumb for you to remember when working with code like this is: minimize the number of allocations. It may be worth dropping down to a lower-level language in which you can explicitly control allocation (if you don't know any such language already, I recommend Rust).

Comments