What's the right numpy syntax to compare one column against others in a 2d ndarray?
After reading some docs on array broadcasting, I am still not quite sure what the correct way to do this is.
Example: Suppose I have a 2d array of goals scored by each player (row) in each game (column).
# goals = number of goals scored by ith player in jth game (NaN if player did not play)
# column = game
goals = np.array([ [np.nan, 0, 1], # row = player
[ 1, 2, 0],
[ 0, 0, np.nan],
[np.nan, 1, 1],
[ 0, 0, 1] ])
goals[:,2] > goals[:,:2]
ValueError: operands could not be broadcast together with shapes (5,) (5,2)
personalBest= ( np.isnan(goals[:,:2]) |
(goals[:,2][:,np.newaxis] > goals[:,:2] )
print(personalBest) # returns desired solution
You could do something like this -
np.flatnonzero((goals[:,None,-1] > goals[:,:-1]).any(1))
Let's go through it in steps.
Step #1: We are introducing a new axis on the last-column sliced version to keep it as
2D with the last axis being a singleton dimension/axis. The idea is to compare each of its element against all elements in that row except the element itself :
In : goals[:,None,-1] Out: array([[ 1.], [ 0.], [ nan], [ 1.], [ 1.]]) In : goals[:,None,-1].shape # Check the shapes for broadcasting alignment Out: (5, 1) In : goals.shape Out: (5, 3)
Step #2: Next up, we are actually performing the comparison against all the columns of the array skipping the last column itself as that's part of the sliced version obtained earlier -
In : goals[:,None,-1] > goals[:,:-1] Out: array([[False, True], [False, False], [False, False], [False, False], [ True, True]], dtype=bool)
Step #3: Then, we are checking if there's ANY match along each row -
In : (goals[:,None,-1] > goals[:,:-1]).any(axis=1) Out: array([ True, False, False, False, True], dtype=bool)
Step #4: Finally, getting the matching indices with
In : np.flatnonzero((goals[:,None,-1] > goals[:,:-1]).any(axis=1)) Out: array([0, 4])