C8H10N4O2 - 1 year ago 72
Python Question

# broadcasting a comparison of a column of 2d array with multiple columns

What's the right numpy syntax to compare one column against others in a 2d ndarray?

After reading some docs on array broadcasting, I am still not quite sure what the correct way to do this is.

Example: Suppose I have a 2d array of goals scored by each player (row) in each game (column).

``````# goals = number of goals scored by ith player in jth game (NaN if player did not play)
# column = game
goals = np.array([ [np.nan, 0,      1],   # row = player
[     1, 2,      0],
[     0, 0, np.nan],
[np.nan, 1,      1],
[     0, 0,      1] ])
``````

I want to know if, in the final game, the player achieved a personal record by scoring more goals than she did in any previous game, ignoring games in which she did not appear (represented as
`nan`
). I expect
`True`
for only the first and last players in the array.

Just writing
`goals[:,2] > goals[:,:2]`
returns the
`ValueError: operands could not be broadcast together with shapes (5,) (5,2)`

What I tried: I know that I can manually stretch the
`(5,)`
into
`(5,2)`
with
`np.newaxis`
. So this works:

``````with np.errstate(invalid='ignore'):
personalBest= ( np.isnan(goals[:,:2]) |
(goals[:,2][:,np.newaxis] > goals[:,:2] )
).all(axis=1)

print(personalBest) # returns desired solution
``````

Is there a less hacky, more idiomatically numpy way to write this?

You could do something like this -

``````np.flatnonzero((goals[:,None,-1] > goals[:,:-1]).any(1))
``````

Let's go through it in steps.

Step #1: We are introducing a new axis on the last-column sliced version to keep it as `2D` with the last axis being a singleton dimension/axis. The idea is to compare each of its element against all elements in that row except the element itself :

``````In [3]: goals[:,None,-1]
Out[3]:
array([[  1.],
[  0.],
[ nan],
[  1.],
[  1.]])

In [4]: goals[:,None,-1].shape # Check the shapes for broadcasting alignment
Out[4]: (5, 1)

In [5]: goals.shape
Out[5]: (5, 3)
``````

Step #2: Next up, we are actually performing the comparison against all the columns of the array skipping the last column itself as that's part of the sliced version obtained earlier -

``````In [7]: goals[:,None,-1] > goals[:,:-1]
Out[7]:
array([[False,  True],
[False, False],
[False, False],
[False, False],
[ True,  True]], dtype=bool)
``````

Step #3: Then, we are checking if there's ANY match along each row -

``````In [8]: (goals[:,None,-1] > goals[:,:-1]).any(axis=1)
Out[8]: array([ True, False, False, False,  True], dtype=bool)
``````

Step #4: Finally, getting the matching indices with `np.flatnonzero` -

``````In [9]: np.flatnonzero((goals[:,None,-1] > goals[:,:-1]).any(axis=1))
Out[9]: array([0, 4])
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download