Ritchie - 11 months ago 62

Python Question

I've a 2 arrays:

`np.array(y_pred_list).shape`

# returns (5, 47151, 10)

np.array(y_val_lst).shape

# returns (5, 47151, 10)

np.array(y_pred_list)[:, 2, :]

# returns

array([[ 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],

[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],

[ 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],

[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],

[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])

np.array(y_val_lst)[:, 2, :]

# returns

array([[ 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],

[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],

[ 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],

[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],

[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)

I would like to go through all 47151 examples, and calculate the "accuracy". Meaning the sum of those in y_pred_list that matches y_val_lst over 47151. What's the comparison function for this?

Answer Source

Sounds like you want something like this:

```
accuracy = (y_pred_list == y_val_lst).all(axis=(0,2)).mean()
```

...though since your arrays are clearly floating-point arrays, you might want to allow for numerical-precision errors rather than insisting on exact equality:

```
accuracy = (numpy.abs(y_pred_list - y_val_lst) < tolerance ).all(axis=(0,2)).mean()
```

(where, for example, `tolerance = 1e-10`

)

The `.all(axis=(0,2))`

call records cases in which everything in its input is `True`

(i.e. everything matches) when working along the dimension 0 (i.e. the one that has extent 5) and dimension 2 (the one that has extent 10). It outputs a one-dimensional array of length 47151. The `.mean()`

call then gives you the proportion of matches in that sequence, which is my best guess as to what you mean by "over 47151".