user994144 - 5 months ago 19x
Python Question

fastest way to convert bitstring numpy array to integer base 2

I have a numpy array consisting of bitstrings and I intend to convert bitstrings to integer base 2 in order to perform some xor bitwise operations. I can convert string to integer with base 2 in python with this:

``````int('000011000',2)
``````

I am wondering if there is a faster and better way to do this in numpy. An example of numpy array that I am working on is something like this:

``````array([['0001'],
['0010']],
dtype='|S4')
``````

and I expect to convert it to:

``````array([[1],[2]])
``````

One could use `np.fromstring` to separate out each of the string bits into `uint8` type numerals and then use some maths with matrix-multiplication to convert/reduce to decimal format. Thus, with `A` as the input array, one approach would be like so -

``````# Convert each bit of input string to numerals
str2num = (np.fromstring(A, dtype=np.uint8)-48).reshape(-1,4)

# Setup conversion array for binary number to decimal equivalent
de2bi_convarr = 2**np.arange(3,-1,-1)

# Use matrix multiplication for reducing each row of str2num to a single decimal
out = str2num.dot(de2bi_convarr)
``````

Sample run -

``````In [113]: A    # Modified to show more variety
Out[113]:
array([['0001'],
['1001'],
['1100'],
['0010']],
dtype='|S4')

In [114]: str2num = (np.fromstring(A, dtype=np.uint8)-48).reshape(-1,4)

In [115]: str2num
Out[115]:
array([[0, 0, 0, 1],
[1, 0, 0, 1],
[1, 1, 0, 0],
[0, 0, 1, 0]], dtype=uint8)

In [116]: de2bi_convarr = 2**np.arange(3,-1,-1)

In [117]: de2bi_convarr
Out[117]: array([8, 4, 2, 1])

In [118]: out = str2num.dot(de2bi_convarr)

In [119]: out
Out[119]: array([ 1,  9, 12,  2])
``````

An alternative method could be suggested to avoid `np.fromstring`. With this method, we would convert to int datatype at the start, then separate out each digit, which should be equivalent of `str2num` in the previous method. Rest of the code would stay the same. Thus, an alternative implementation would be -

``````# Convert to int array and thus convert each bit of input string to numerals
str2num = np.remainder(A.astype(np.int)//(10**np.arange(3,-1,-1)),10)

de2bi_convarr = 2**np.arange(3,-1,-1)
out = str2num.dot(de2bi_convarr)
``````

Runtime tests

Let's time all the approaches listed thus far to solve the problem, including `@Kasramvd's loopy solution`.

``````In [198]: # Setup a huge array of such strings
...: A = np.array([['0001'],['1001'],['1100'],['0010']],dtype='|S4')
...: A = A.repeat(10000,axis=0)

In [199]: def app1(A):
...:     str2num = (np.fromstring(A, dtype=np.uint8)-48).reshape(-1,4)
...:     de2bi_convarr = 2**np.arange(3,-1,-1)
...:     out = str2num.dot(de2bi_convarr)
...:     return out
...:
...: def app2(A):
...:     str2num = np.remainder(A.astype(np.int)//(10**np.arange(3,-1,-1)),10)
...:     de2bi_convarr = 2**np.arange(3,-1,-1)
...:     out = str2num.dot(de2bi_convarr)
...:     return out
...:

In [200]: %timeit app1(A)
1000 loops, best of 3: 1.46 ms per loop

In [201]: %timeit app2(A)
10 loops, best of 3: 36.6 ms per loop

In [202]: %timeit np.array([[int(i[0], 2)] for i in A]) # @Kasramvd's solution
10 loops, best of 3: 61.6 ms per loop
``````