Gil Barash Gil Barash - 3 years ago 161
Python Question

Python - "xor"ing each byte in "bytes" in the most efficient way

I have a "bytes" object and an "int" mask, I want to do a xor over all the bytes with my mask.
I do this action repeatedly over big "bytes" objects (~ 4096 KB).

This is the code I have which does the work well, only it is very CPU intensive and slows down my script:

# 'data' is bytes and 'mask' is int
bmask = struct.pack('!I', mask) # converting the "int" mask to "bytes" of 4 bytes
a = bytes(b ^ m for b, m in zip(data, itertools.cycle(bmask)))

The best I could come up with is this, which is about 20 times faster:

# 'data' is bytes and 'mask' is int
# reversing the bytes of the mask
bmask = struct.pack("<I", mask)
mask = struct.unpack(">I", bmask)[0]

# converting from bytes to array of "int"s
arr = array.array("I", data)

# looping over the "int"s
for i in range(len(arr)):
arr[i] ^= mask

# must return bytes
a = bytes(arr)

My questions are:

  1. Is there a more efficient way to do this (CPU-wize)?

  2. Is there a "cleaner" way to do this (without hurting performance)?

P.S. if it is of any importance, I'm using Python 3.5

Answer Source

You can't get much faster than your algorithm using pure Python. You can shave off a tiny bit of time by doing the looping in a list comprehension, but then that list needs to be converted back into an array, and the array has to be converted to bytes, so it uses more RAM for a negligible benefit.

However, you can make this much faster by using Numpy. Here's a short demo.

from time import perf_counter
from random import randrange, seed
import array
import numpy as np


def timed(func):
    ''' Timing decorator '''
    def wrapped(*args):
        start = perf_counter()
        result = func(*args)
        stop = perf_counter()
        print('{}: {:.6f} seconds'.format(func.__name__, stop - start))
        return result
    wrapped.__name__ = func.__name__
    wrapped.__doc__ = func.__doc__
    return wrapped

def do_mask_arr1(data, mask):
    arr = array.array("I", data)
    # looping over the "int"s
    for i in range(len(arr)):
        arr[i] ^= mask
    return arr.tobytes()

def do_mask_arr2(data, mask):
    arr = array.array("I", data)
    return array.array("I", [u ^ mask for u in arr]).tobytes()

def do_mask_numpy(data, mask):
    return (np.fromstring(data, dtype=np.uint32) ^ mask).tobytes()

def make_data(datasize):
    ''' Make some random bytes '''
    return bytes(randrange(256) for _ in range(datasize))

datasize = 100000
mask = 0x12345678
data = make_data(datasize)

d1 = do_mask_arr1(data, mask)
d2 = do_mask_arr2(data, mask)
print(d1 == d2)

d3 = do_mask_numpy(data, mask)
print(d1 == d3)

typical output

make_data: 0.751557 seconds
do_mask_arr1: 0.026865 seconds
do_mask_arr2: 0.025110 seconds
do_mask_numpy: 0.000438 seconds

Tested using Python 3.6.0 on an old single core 32 bit 2GHz machine running on Linux.

I just did a run with datasize = 4000000 and do_mask_numpy took 0.0422 seconds.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download