Ritchie - 1 month ago 14
Python Question

# Comparing Arrays for Accuracy

I've a 2 arrays:

``````np.array(y_pred_list).shape
# returns (5, 47151, 10)
np.array(y_val_lst).shape
# returns (5, 47151, 10)

np.array(y_pred_list)[:, 2, :]
# returns
array([[ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])

np.array(y_val_lst)[:, 2, :]
# returns
array([[ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]], dtype=float32)
``````

I would like to go through all 47151 examples, and calculate the "accuracy". Meaning the sum of those in y_pred_list that matches y_val_lst over 47151. What's the comparison function for this?

Sounds like you want something like this:

``````accuracy = (y_pred_list == y_val_lst).all(axis=(0,2)).mean()
``````

...though since your arrays are clearly floating-point arrays, you might want to allow for numerical-precision errors rather than insisting on exact equality:

``````accuracy = (numpy.abs(y_pred_list - y_val_lst) < tolerance ).all(axis=(0,2)).mean()
``````

(where, for example, `tolerance = 1e-10`)

The `.all(axis=(0,2))` call records cases in which everything in its input is `True` (i.e. everything matches) when working along the dimension 0 (i.e. the one that has extent 5) and dimension 2 (the one that has extent 10). It outputs a one-dimensional array of length 47151. The `.mean()` call then gives you the proportion of matches in that sequence, which is my best guess as to what you mean by "over 47151".

Source (Stackoverflow)