SlowerPhoton - 6 months ago 43

Python Question

On the input I have a signed array of bytes

`barr`

`f`

`barr`

My approach is to convert

`barr`

`val`

`int.from_bytes`

`val`

`def multiply(barr, f):`

val = int.from_bytes(barr, byteorder='little', signed=True)

val *= f

val = int (val)

val = cropInt(val, bitLen = barr.__len__()*8)

barr = val.to_bytes(barr.__len__(), byteorder='little', signed=True)

return barr

def cropInt(integer, bitLen, signed = True):

maxValue = (2**(bitLen-1)-1) if signed else (2**(bitLen)-1)

minValue = -maxValue-1 if signed else 0

if integer > maxValue:

integer = maxValue

if integer < minValue:

integer = minValue

return integer

However this process is extremely slow when processing a large amount of data. Is there a better, more efficient way to do that?

Answer

Pure Python is rather innefective for any numeric calculations - because due to each number being treated as an object, each operation involves a lot of "under the hood" steps.

On the other hand, Python can be very effective for numeric calculation if you use the appropriate set of third party libraries.

In your case, sice performance matters, you can make use of `NumPy`

- the de facto Python package for numeric processing.

With it the casting, multiplication and recasting will be done in native code in one pass each (and after knowing better NumPy than I do, probably with even less steps) - and should give you an improvement of 3-4 orders of magnitude in speed for this task:

```
import numpy as np
def multiply(all_bytes, f, bitlen, signed=True):
# Works for 8, 16, 32 and 64 bit integers:
dtype = "%sint%d" % ("" if signed else "", bitlen)
max_value = 2 ** (bitlen- (1 if signed else 0)) - 1
input_data = np.frombuffer(all_bytes, dtype=dtype)
processed = np.clip(input_data * f, 0, max_value)
return bytes(processed.astype(dtype))
```

Please not this example takes all your byte-data at once, not one at a time as you pass to your original "multiply" function. Threfore, you also have to pass it the size in bits of your integers.

The line that goes `dtype = "%sint%d" % ("" if signed else "", bitlen)`

creates the data-type name, as used by NumPy from the number of bits passed in. SInce the name is just a string, it interpolates a string adding or not an "u" prefix, depending on the datatype being unsigned, and put the number of bits at the end. NumPy datatypes can be checked at: https://docs.scipy.org/doc/numpy/user/basics.types.html

Running with an array of 500000 8bit signed integers I get these timings:

In [99]: %time y = numpy_multiply(data, 1.7, 8) CPU times: user 3.01 ms, sys: 4.96 ms, total: 7.97 ms Wall time: 7.38 ms

In [100]: %time x = original_multiply(data, 1.7, 8) CPU times: user 11.3 s, sys: 1.86 ms, total: 11.3 s Wall time: 11.3 s

(That is after modifying your function to operate on all bytes at a time as well) - an speedup of 1500 times, as I've stated on the first draft.