DJV - 3 days ago 4
Python Question

# Is it possible to effectively initialize bytearray with non-zero value?

I need to have huge boolean array. All values should be initialized as "True":

``````arr = [True] * (10 ** 9)
``````

But created as above it takes too much memory. So I decided to use
`bytearray`
for that:

``````arr = bytearray(10 ** 9)  # initialized with zeroes
``````

Is it possible to initialize
`bytearray`
with
`b'\x01'`
as effectively as it is initialized by
`b'\x00'`
?

I understand I could initialize
`bytearray`
with zeros and inverse my logic. But I'd prefer not to do that if possible.

timeit:

``````>>> from timeit import timeit
>>> def f1():
...   return bytearray(10**9)
...
>>> def f2():
...   return bytearray(b'\x01'*(10**9))
...
>>> timeit(f1, number=100)
14.117428014000325
>>> timeit(f2, number=100)
51.42543800899875
``````

Consider using NumPy for this sort of thing. On my computer, `np.ones` (which initializes an array of all-1 values) with boolean "dtype" is just as fast as the bare `bytearray` constructor:

``````>>> import numpy as np
>>> from timeit import timeit
>>> def f1(): return bytearray(10**9)
>>> def f2(): return np.ones(10**9, dtype=np.bool)
>>> timeit(f1, number=100)
24.9679438900057
>>> timeit(f2, number=100)
24.732190757000353
``````

If you don't want to use third-party modules, another option with competitive performance is to create a one-element `bytearray` and then expand that, instead of creating a large byte-string and converting it to a bytearray.

``````>>> def f3(): return bytearray(b'\x01')*(10**9)
>>> timeit(f3, number=100)
24.842667759003234
``````

Since my computer appears to be slower than yours, here is the performance of your original option for comparison:

``````>>> def fX(): return bytearray(b'\x01'*(10**9))
>>> timeit(fX, number=100)
56.61828187300125
``````

Cost in all cases is going to be dominated by allocating a decimal gigabyte of RAM and writing to every byte of it. `fX` is roughly twice as slow as the other three functions because it has to do this twice. A good rule of thumb for you to remember when working with code like this is: minimize the number of allocations. It may be worth dropping down to a lower-level language in which you can explicitly control allocation (if you don't know any such language already, I recommend Rust).