C8H10N4O2 C8H10N4O2 - 1 month ago 6x
Python Question

broadcasting a comparison of a column of 2d array with multiple columns

What's the right numpy syntax to compare one column against others in a 2d ndarray?

After reading some docs on array broadcasting, I am still not quite sure what the correct way to do this is.

Example: Suppose I have a 2d array of goals scored by each player (row) in each game (column).

# goals = number of goals scored by ith player in jth game (NaN if player did not play)
# column = game
goals = np.array([ [np.nan, 0, 1], # row = player
[ 1, 2, 0],
[ 0, 0, np.nan],
[np.nan, 1, 1],
[ 0, 0, 1] ])

I want to know if, in the final game, the player achieved a personal record by scoring more goals than she did in any previous game, ignoring games in which she did not appear (represented as
). I expect
for only the first and last players in the array.

Just writing
goals[:,2] > goals[:,:2]
returns the
ValueError: operands could not be broadcast together with shapes (5,) (5,2)

What I tried: I know that I can manually stretch the
. So this works:

with np.errstate(invalid='ignore'):
personalBest= ( np.isnan(goals[:,:2]) |
(goals[:,2][:,np.newaxis] > goals[:,:2] )

print(personalBest) # returns desired solution

Is there a less hacky, more idiomatically numpy way to write this?


You could do something like this -

np.flatnonzero((goals[:,None,-1] > goals[:,:-1]).any(1))

Let's go through it in steps.

Step #1: We are introducing a new axis on the last-column sliced version to keep it as 2D with the last axis being a singleton dimension/axis. The idea is to compare each of its element against all elements in that row except the element itself :

In [3]: goals[:,None,-1]
array([[  1.],
       [  0.],
       [ nan],
       [  1.],
       [  1.]])

In [4]: goals[:,None,-1].shape # Check the shapes for broadcasting alignment
Out[4]: (5, 1)

In [5]: goals.shape
Out[5]: (5, 3)

Step #2: Next up, we are actually performing the comparison against all the columns of the array skipping the last column itself as that's part of the sliced version obtained earlier -

In [7]: goals[:,None,-1] > goals[:,:-1]
array([[False,  True],
       [False, False],
       [False, False],
       [False, False],
       [ True,  True]], dtype=bool)

Step #3: Then, we are checking if there's ANY match along each row -

In [8]: (goals[:,None,-1] > goals[:,:-1]).any(axis=1)
Out[8]: array([ True, False, False, False,  True], dtype=bool)

Step #4: Finally, getting the matching indices with np.flatnonzero -

In [9]: np.flatnonzero((goals[:,None,-1] > goals[:,:-1]).any(axis=1))
Out[9]: array([0, 4])