Larry Freeman Larry Freeman - 5 months ago 9
Python Question

numpy: setting duplicate values in a row to 0

I am working on a scoring method that scores based on correct answers with three attempts. For each row, if the first attempt is right, the score is 1, if the second attempt is right, the score is 1/2, if the third attempt is right, the score is 1/3.

The total score is the sum of the scores for each row divided by the number of rows.

The inputs that need to be scored are numpy arrays that are shape(n,3) where n is any number.

print inputs[0:5]
[[11111 22222 22222]
[44444 55555 55555]
[33333 33333 33333]
[11111 11111 11111]]


To make this work, I need to change any duplicate values to 0 to prevent double counting. If the first attempt = second attempt, the second should be set to 0. If the second attempt=third attempt, the third should be set to 0 and so on.

The above numpy array should be changed to the following:

[[11111 22222 0]
[44444 55555 0]
[33333 0 0]
[11111 0 0]]


What is the pythonic way to change duplicate values in a given row to 0 in a numpy array?

Answer

You could use np.diff -

input[:,1:] *=(np.diff(input,axis=1)!=0)

Sample run -

In [19]: input
Out[19]: 
array([[11111, 22222, 22222],
       [44444, 55555, 55555],
       [33333, 33333, 33333],
       [11111, 11111, 11111]])

In [20]: input[:,1:] *=(np.diff(input,axis=1)!=0)

In [21]: input
Out[21]: 
array([[11111, 22222,     0],
       [44444, 55555,     0],
       [33333,     0,     0],
       [11111,     0,     0]])