aph aph - 5 months ago 25
Python Question

Element-wise weave of two arrays used with numpy repeat

I have two arrays of unequal length val1 and val2 that I am trying to weave together in a specific way that is defined by the equal-length arrays mult1 and mult2. In general, my arrays are very long (~1e6 elements), and this is a performance-critical bottleneck in my calculation, so I cannot afford to do a python-for loop and so I am trying to take advantage of vectorized functions in Numpy. For the sake of being explicit:

mult1 = np.array([0, 1, 2, 1, 0])
mult2 = np.array([1, 0, 1, 1, 0])

val1 = np.array([1, 2, 3, 4])
val2 = np.array([-1, -2, -3])

desired_final_result = np.array([-1, 1, 2, 3, -2, 4, -3])


The weaving of val1 and val2 is defined by the following element-wise procession through the indices of mult1 and mult2. Each entry of the two mult arrays defines how many elements to choose from the corresponding val array. We proceed element-wise through the mult arrays; the value of mult1[i] determines how many entries we choose from val1, then we proceed to the value of mult2[i] to select the appropriate number of val2 entries, always choosing the val1 entries to come first for each index i.

Note that len(val1) = mult1.sum() and len(val2) = mult2.sum(), so we always end up with a final array with len(desired_final_result) = len(val1) + len(val2).

Explicit explanation of minimal example




  • Since entry i=0 of mult1 is 0, we select 0 entries from val1 and move on to entry i=0 of mult2, which is 1, so we select 1 entry from val2. This explains why the first entry of desired_final_result is -1.

  • Since entry i=1 of mult1 is 1, we select 1 entry from val1 and move on to entry i=1 of mult2, which is 0, so select 0 entries from val2. This explains why the second entry of desired_final_result is 1.

  • Since entry i=2 of mult1 is 2, we select the next 2 entries from val1 and move on to entry i=2 of mult2, which is 1, so we select the next 1 entry from val2. This explains why entries 2-4 of desired_final_result are 2, 3, -2.

  • Since entry i=3 of mult1 is 1, we select the next 1 entry from val1 and move on to entry i=3 of mult2, which is also 1, so we select the next 1 entry from val2. This explains why entries 5-6 of desired_final_result are 4, -3.

  • Finally, since the i=4 of both mult1 and mult2 is 0, we have nothing left to do and our array is filled.



Question



Is there a way to use vectorized functions such as np.repeat and/or np.choose to solve my problem? Or do I need to resort to coding this calculation up in C and wrapping it into python?

Answer

Creating a Boolean index into the result array:

mult = np.array([mult1, mult2]).ravel('F')
tftf = np.tile([True, False], len(mult1))
mask = np.repeat(tftf, mult)

result = np.empty(len(val1) + len(val2), int)
result[ mask] = val1
result[~mask] = val2