Chris Chris - 5 months ago 8
Python Question

Apply function to an array of tuples

I have a function that I would like to apply to an array of tuples and I am wondering if there is a clean way to do it.

Normally, I could use

np.vectorize
to apply the function to each item in the array, however, in this case "each item" is a tuple so numpy interprets the array as a 3d array and applies the function to each item within the tuple.

So I can assume that the incoming array is one of:


  1. tuple

  2. 1 dimensional array of tuples

  3. 2 dimensional array of tuples



I can probably write some looping logic but it seems like
numpy
most likely has something that does this more efficiently and I don't want to reinvent the wheel.

This is an example. I am trying to apply the
tuple_converter
function to each tuple in the array.

array_of_tuples1 = np.array([
[(1,2,3),(2,3,4),(5,6,7)],
[(7,2,3),(2,6,4),(5,6,6)],
[(8,2,3),(2,5,4),(7,6,7)],
])

array_of_tuples2 = np.array([
(1,2,3),(2,3,4),(5,6,7),
])

plain_tuple = (1,2,3)



# Convert each set of tuples
def tuple_converter(tup):
return tup[0]**2 + tup[1] + tup[2]

# Vectorizing applies the formula to each integer rather than each tuple
tuple_converter_vectorized = np.vectorize(tuple_converter)

print(tuple_converter_vectorized(array_of_tuples1))
print(tuple_converter_vectorized(array_of_tuples2))
print(tuple_converter_vectorized(plain_tuple))


Desired Output for
array_of_tuples1
:

[[ 6 11 38]
[54 14 37]
[69 13 62]]


Desired Output for
array_of_tuples2
:

[ 6 11 38]


Desired Output for
plain_tuple
:

6


But the code above produces this error (because it is trying to apply the function to an integer rather than a tuple.)

<ipython-input-209-fdf78c6f4b13> in tuple_converter(tup)
10
11 def tuple_converter(tup):
---> 12 return tup[0]**2 + tup[1] + tup[2]
13
14

IndexError: invalid index to scalar variable.

Answer

array_of_tuples1 and array_of_tuples2 are not actually arrays of tuples, but just 3- and 2-dimensional arrays of integers:

In [1]: array_of_tuples1 = np.array([
   ...:         [(1,2,3),(2,3,4),(5,6,7)],
   ...:         [(7,2,3),(2,6,4),(5,6,6)],
   ...:         [(8,2,3),(2,5,4),(7,6,7)],
   ...:     ])

In [2]: array_of_tuples1
Out[2]: 
array([[[1, 2, 3],
        [2, 3, 4],
        [5, 6, 7]],

       [[7, 2, 3],
        [2, 6, 4],
        [5, 6, 6]],

       [[8, 2, 3],
        [2, 5, 4],
        [7, 6, 7]]])

So, instead of vectorizing your function, because it then will basically for-loop through the elements of the array (integers), you should apply it on the suitable axis (the axis of the "tuples") and not care about the type of the sequence:

In [6]: np.apply_along_axis(tuple_converter, 2, array_of_tuples1)
Out[6]: 
array([[ 6, 11, 38],
       [54, 14, 37],
       [69, 13, 62]])

In [9]: np.apply_along_axis(tuple_converter, 1, array_of_tuples2)
Out[9]: array([ 6, 11, 38])
Comments